Hi All,
I have set up the portworx cluster with the backend ceph on Openstack private cloud on my Kubernetes cluster and I see one strange phenomenon. The Portworx status consistently goes down whenever I attached Kafka pod to it. The error log said kvdb disconnected and then waiting to join the quorom with port 9002. The whole cluster goes down and I need to wait for 30 mins to let the portworx cluster reinitialize… which is not a very good experience.
The Portworx is set up with Portworx Operator with kvdb(/dev/vdb) size as 32GB and the storage disk(/dev/vdc) as 150GB.
I tried to wipe the cluster following the guide here(Uninstall Portworx from a Kubernetes cluster using the DaemonSet) and rebuilt but still encounter the same issue.
Error log:
PX is not running on host: Could not reach ‘HealthMonitor’
Error while calling home: KVDB connection failed, either node has networking issues or KVDB members are down or KVDB cluster is unhealthy. All operations (get/update/delete) are unavailable.
Failed to get node status warnings: couldn’t get: /nodestatuswarnings with error: Get “http://localhost:9001/nodestatuswarnings”: dial tcp 127.0.0.1:9001: connect: connection refused
kvdb error: context deadline exceeded, retry count 3
Hope anyone can give me some clue to solve this issue…
Thanks.