Portworx pod keeps showing Readiness probe failed

I have created a kubernetes cluster following the guideline at
https://thenewstack.io/run-stateful-containerized-workloads-with-rancher-kubernetes-engine-and-portworx on top of openstack. And want to build the Portworx pod at kube-system namespace following the steps at https://install.portworx.com/ and choose the specification that suits my environment.

However, the portworx and portworx api pod keeps showing
Readiness probe failed: HTTP probe failed with statuscode: 503
when using the kubectl describe command.

The outcome is like
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system portworx-9d5wc 0/1 Running 3 47m
kube-system portworx-api-8rm9f 0/1 Running 0 47m

Without 1/1 at the READY but STATUS is running.

Could anyone help on it? I am new to portworx configuration.

Logs:
Normal PortworxMonitorImagePullInPrgress 29m portworx Portworx image portworx/px-essentials:2.6.3 pull and extraction in progress
Normal PortworxMonitorImagePullInPrgress 14m portworx Portworx image portworx/px-essentials:2.6.3 pull and extraction in progress
Warning Unhealthy 29s (x255 over 44m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503

How many nodes in your cluster? portworx needs a minimum of 3 nodes to form a cluster quorum. you should have 3 portworx API pods It could be the same reason Portworx Readiness prob waits for 10 sec and then failed. let’s get the below details for further insights

  PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
  kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status

and

kubectl logs -n kube-system -l name=portworx --tail=99999

There are one master node and three worker nodes in the cluster.

Logs for
kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status

Type ID Resource Severity Count LastSeen FirstSeen Description

NODE NodeStateChange dab8e483-4eba-44b0-8a80-474cbbf0596a ALARM 1 Feb 8 07:49:42 UTC 2021 Feb 8 07:49:42 UTC 2021 Node is not in quorum. Waiting to connect to peer nodes on port 9002.

NODE NfsDependencyNotEnabled dragon-worker1 ALARM 1 Feb 8 06:50:48 UTC 2021 Feb 8 06:50:48 UTC 2021 Could not enable NFS service

NODE NfsDependencyInstallFailure dragon-worker1 ALARM 1 Feb 8 06:50:38 UTC 2021 Feb 8 06:50:38 UTC 2021 Could not install NFS service: Command ‘DEBIAN_FRONTEND=noninteractive apt-get install -yq dbus nfs-common rpcbind nfs-kernel-server’ failed: exit status 100

For another Portworx node:

Type ID Resource Severity Count LastSeen FirstSeen Description
NODE NodeStartFailure e5d801b2-b3e2-4189-8fd5-7b24e1825ca1 ALARM 1 Feb 8 08:41:22 UTC 2021 Feb 8 08:41:22 UTC 2021 Could not find any available storage disks on this node: unable to start as a storageless node as no storage node found in the cluster. Please add storage to your nodes and restart Portworx.

Logs for
kubectl logs -n kube-system -l name=portworx --tail=99999

{“status”:“Downloading”,“progressDetail”:{“current”:199215539,“total”:394718365},“progress”:"[=========================\u003e ] 199.2MB/394.7MB",“id”:“e0c0a261a528”}
{“status”:“Downloading”,“progressDetail”:{“current”:199756211,“total”:394718365},“progress”:"[=========================\u003e ] 199.8MB/394.7MB",“id”:“e0c0a261a528”}
{“status”:“Extracting”,“progressDetail”:{“current”:31195136,“total”:183604101},“progress”:"[========\u003e ] 31.2MB/183.6MB",“id”:“ced5f0a8a118”}
{“status”:“Downloading”,“progressDetail”:{“current”:200296883,“total”:394718365},“progress”:"[=========================\u003e ] 200.3MB/394.7MB",“id”:“e0c0a261a528”}
{“status”:“Downloading”,“progressDetail”:{“current”:200837555,“total”:394718365},“progress”:"[=========================\u003e ] 200.8MB/394.7MB",“id”:“e0c0a261a528”}
{“status”:“Extracting”,“progressDetail”:{“current”:31752192,“total”:183604101},“progress”:"[========\u003e ] 31.75MB/183.6MB",“id”:“ced5f0a8a118”}
{“status”:“Downloading”,“progressDetail”:{“current”:201365939,“total”:394718365},“progress”:"[=========================\u003e ] 201.4MB/394.7MB",“id”:“e0c0a261a528”}
{“status”:“Downloading”,“progressDetail”:{“current”:201906611,“total”:394718365},“progress”:"[=========================\u003e ] 201.9MB/394.7MB",“id”:“e0c0a261a528”}
{“status”:“Extracting”,“progressDetail”:{“current”:32309248,“total”:183604101},“progress”:"[========\u003e ] 32.31MB/183.6MB",“id”:“ced5f0a8a118”}
{“status”:“Downloading”,“progressDetail”:{“current”:202434995,“total”:394718365},“progress”:"[=========================\u003e ] 202.4MB/394.7MB",“id”:“e0c0a261a528”}
{“status”:“Extracting”,“progressDetail”:{“current”:32866304,“total”:183604101},“progress”:"[========\u003e ] 32.87MB/183.6MB",“id”:“ced5f0a8a118”}

As per your log message, it shows clearly that you don’t have storage driver for portworx on the other worker nodes.

Could not find any available storage disks on this node: unable to start as a storageless node as no storage node found in the cluster. Please add storage to your nodes and restart Portworx.

Check this Prerequisites and also share your install spec yaml file, so I can have better understanding on your configurations.

Hi thanks!

I have added 60GB not mounted storage disks to each worker nodes, however, one out of the three nodes shows the below message after executing /opt/pwx/bin/pxctl status

Node is not in quorum. Waiting to connect to peer nodes on port 9002.

while other nodes are connected to kvdb.

If you are reinstalling multiple time on the same nodes, previous installations config binaries might be causing issue. I would recommend to do a clean uninstall using this docs page Uninstall from Kubernetes cluster . make sure there is no configmap, or any other objects related to portworx under kube-system namespace. and do a clean install again. let me know how it goes.

1 Like

Hey! Yes, the Portworx Kubernetes is up and running now! Thanks for the help.

1 Like

Perfect. What was happened ?