Portworx Port 9002 consistently getting disconnected

Hi All,

I have set up the portworx cluster with the backend ceph on Openstack private cloud on my Kubernetes cluster and I see one strange phenomenon. The Portworx status consistently goes down whenever I attached Kafka pod to it. The error log said kvdb disconnected and then waiting to join the quorom with port 9002. The whole cluster goes down and I need to wait for 30 mins to let the portworx cluster reinitialize… which is not a very good experience.

The Portworx is set up with Portworx Operator with kvdb(/dev/vdb) size as 32GB and the storage disk(/dev/vdc) as 150GB.

I tried to wipe the cluster following the guide here(Uninstall Portworx from a Kubernetes cluster using the DaemonSet) and rebuilt but still encounter the same issue.

Error log:
PX is not running on host: Could not reach ‘HealthMonitor’

Error while calling home: KVDB connection failed, either node has networking issues or KVDB members are down or KVDB cluster is unhealthy. All operations (get/update/delete) are unavailable.

Failed to get node status warnings: couldn’t get: /nodestatuswarnings with error: Get “http://localhost:9001/nodestatuswarnings”: dial tcp connect: connection refused

kvdb error: context deadline exceeded, retry count 3

Hope anyone can give me some clue to solve this issue…