Readiness probe error: Get "http://127.0.0.1:17001/status": dial tcp 127.0.0.1:17001: connect: connection refused

Shreyashirwadkar · July 25, 2024, 7:18am

Hi,

We are experiencing issues deploying Portworx Enterprise on both OpenShift Container Platform (OCP) version 4.14 and Kubernetes. Could someone please assist us with finding a solution?

Portworx-api pod is in CrashLoopBackOff state with readiness probe error (Get “http://127.0.0.1:17001/status”: dial tcp 127.0.0.1:17001: connect: connection refused).
The portworx-api and pxcsi-ext pods are in CrashLoopBackOff state.
The StorageCluster and StorageNode is stuck in Initializing phase.

Below are the pod logs:

**# oc get po -A | grep -i px**
openshift-operators                                px-csi-ext-97fff4cf8-9x4vg                                        1/4     CrashLoopBackOff   1365 (75s ago)     42h
openshift-operators                                px-csi-ext-97fff4cf8-v79mj                                        1/4     CrashLoopBackOff   1368 (2m28s ago)   42h
openshift-operators                                px-csi-ext-97fff4cf8-wh2vn                                        1/4     CrashLoopBackOff   1365 (2m5s ago)    42h
openshift-operators                                px-plugin-85d78c474b-gb757                                        1/1     Running            0                  42h
openshift-operators                                px-plugin-85d78c474b-tqfbv                                        1/1     Running            0                  42h
openshift-operators                                px-plugin-proxy-69987b8b6c-lm9x5                                  1/1     Running            0                  42h

**# oc get po -A | grep -i port**
kube-system                                        portworx-proxy-4245c                                              0/1     Running            0                 42h
kube-system                                        portworx-proxy-8jxps                                              0/1     Running            0                 42h
kube-system                                        portworx-proxy-c2t67                                              0/1     Running            0                 42h
kube-system                                        portworx-proxy-g2jqb                                              0/1     Running            0                 42h

openshift-operators                                portworx-6rtqk                                                    0/1     Running            164 (46s ago)     42h
openshift-operators                                portworx-7cjjd                                                    0/1     Running            163 (15m ago)     42h
openshift-operators                                portworx-api-7zkt4                                                0/2     **CrashLoopBackOff   456 (2m36s ago)   42h**
**openshift-operators                                portworx-api-mrkh2                                                0/2     CrashLoopBackOff   456 (3m3s ago)    42h**
**openshift-operators                                portworx-api-twhfr                                                1/2     Running            456 (5m18s ago)   42h**
**openshift-operators                                portworx-api-wcxmf                                                0/2     CrashLoopBackOff   456 (2m4s ago)    42h**
openshift-operators                                portworx-f4854                                                    0/1     Running            164 (16s ago)     42h
openshift-operators                                portworx-operator-864bb59b95-hkrn9                                1/1     Running            0                 42h
openshift-operators                                portworx-p4dwx                                                    0/1     Running            164 (16s ago)     42h

**#oc describe pod -n openshift-operators  portworx-api-twhfr** 

  Warning  BackOff     6m6s (x10594 over 42h)  kubelet  Back-off restarting failed container csi-node-driver-registrar in pod portworx-api-twhfr_openshift-operators(97f3b8eb-4e8a-4629-81a0-be6256144d1c)
  Warning  ProbeError  57s (x26768 over 42h)   kubelet  Readiness probe error: Get "http://127.0.0.1:17001/status": dial tcp 127.0.0.1:17001: connect: connection refused
body:

**# oc logs -n openshift-operators   px-csi-ext-97fff4cf8-9x4vg csi-external-provisioner**

W0724 06:46:21.621306       1 feature_gate.go:241] Setting GA feature gate Topology=true. It will be removed in a future release.
I0724 06:46:21.621419       1 feature_gate.go:249] feature gates: &{map[Topology:true]}
I0724 06:46:21.621445       1 csi-provisioner.go:154] Version: v3.6.1
I0724 06:46:21.621454       1 csi-provisioner.go:177] Building kube configs for running in cluster...
W0724 06:46:31.623192       1 connection.go:183] Still connecting to unix:///csi/csi.sock
W0724 06:46:41.623213       1 connection.go:183] Still connecting to unix:///csi/csi.sock
W0724 06:46:51.623295       1 connection.go:183] Still connecting to unix:///csi/csi.sock
E0724 06:46:51.623338       1 csi-provisioner.go:215] context deadline exceeded

tomjoseph · July 27, 2024, 1:19am

@Shreyashirwadkar csi-ext and portworx-api pods will go to healthy state only once px cluster pods (oc get pods -l name=portworx) are moved to Ready state.

To see why px cluster pods are in unhealthy state. You can either check those pod logs or portworx journal logs using below command.

journalctl -lefu portworx*

Just to make sure that pre-requisites are met for your cluster

Shreyashirwadkar · August 30, 2024, 6:12am

Thanks @ [tomjoseph]
We are not able bring the pods in ready state due to connection refused to port 17001:

#oc describe pod -n openshift-operators portworx-api-twhfr**

Warning BackOff 6m6s (x10594 over 42h) kubelet Back-off restarting failed container csi-node-driver-registrar in pod portworx-api-twhfr_openshift-operators(97f3b8eb-4e8a-4629-81a0-be6256144d1c)
Warning ProbeError 57s (x26768 over 42h) kubelet Readiness probe error: Get “http://127.0.0.1:17001/status”: dial tcp 127.0.0.1:17001: connect: connection refused

Are there any additional steps required to open these ports?

tomjoseph · September 16, 2024, 8:50am

@Shreyashirwadkar Will you be able to share the portworx journal logs to check the exact error from any nodes where px-cluster is not getting ready?

jouranlctl -lu portworx*

kgrando · April 30, 2025, 10:32am

Hi, were you ever able to fix this? I have the same problem in my aks cluster. I tested some node downtimes / failovers etc.
After e node comes back up / recover, the portworx-api and cluster pods on this node are in CrashLoopBackOff state.
I think the cluster pod is waiting for the api pod and this is waiting for the px-csi-ext pod. It’s trying to connect via unix sock, but the problem is, the third of the px-csi-ext pod has already been scheduled on another node, it’s running fine there, but now are on my 3 node cluster two csi pods on one node. If I now delete the pod which is too much on one node and it gets scheduled on the recovered node, all went fine.

It could be fixed if the operator would create podantiaffinity rules, or better topologySpreadConstraints on the csi pods.

here some logs:

px-cnpg-cluster-1-tkpvj	time="2025-04-30T09:32:06Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:9001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:9001: connect: connection refused"
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:32:08.851Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:32:08Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 10 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-csi-ext-664b9dbf76-b2v8v	W0430 09:32:14.911152       1 connection.go:183] Still connecting to unix:///csi/csi.sock
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:15.244209       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:15.559750       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-cnpg-cluster-1-tkpvj	time="2025-04-30T09:32:16Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:9001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:9001: connect: connection refused"
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:32:16.352Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:32:16Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 11 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:32:23.853Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:32:23Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 12 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-csi-ext-664b9dbf76-b2v8v	W0430 09:32:24.910456       1 connection.go:183] Still connecting to unix:///csi/csi.sock
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:25.243267       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:25.559777       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-cnpg-cluster-1-tkpvj	time="2025-04-30T09:32:26Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:9001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:9001: connect: connection refused"
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:32:31.354Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:32:31Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 13 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-csi-ext-664b9dbf76-b2v8v	W0430 09:32:34.910692       1 connection.go:183] Still connecting to unix:///csi/csi.sock
px-csi-ext-664b9dbf76-b2v8v	E0430 09:32:34.910764       1 csi-provisioner.go:215] context deadline exceeded
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: cri-containerd-05115797d210396a85851a4b9acb7159154d16b61b217fde19f976f6eed0612e.scope: Deactivated successfully.
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: run-containerd-io.containerd.runtime.v2.task-k8s.io-05115797d210396a85851a4b9acb7159154d16b61b217fde19f976f6eed0612e-rootfs.mount: Deactivated successfully.
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:35.243328       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-csi-ext-664b9dbf76-b2v8v	E0430 09:32:35.243462       1 main.go:175] error connecting to CSI driver: context deadline exceeded
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: cri-containerd-a4df84f9ff04d667e9f190ab571fa782392a7b054db9148628754d5f77ba5c1a.scope: Deactivated successfully.
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: run-containerd-io.containerd.runtime.v2.task-k8s.io-a4df84f9ff04d667e9f190ab571fa782392a7b054db9148628754d5f77ba5c1a-rootfs.mount: Deactivated successfully.
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:35.559625       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-csi-ext-664b9dbf76-b2v8v	E0430 09:32:35.559940       1 main.go:153] "Failed to create CSI client" err="failed to connect to CSI driver: context deadline exceeded"
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: cri-containerd-daf38ab9e9dafffec341accff538d1ff94328654d45ba62d74a393f5bc1207d4.scope: Deactivated successfully.
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: run-containerd-io.containerd.runtime.v2.task-k8s.io-daf38ab9e9dafffec341accff538d1ff94328654d45ba62d74a393f5bc1207d4-rootfs.mount: Deactivated successfully.
px-csi-ext-664b9dbf76-b2v8v	W0430 09:32:35.745422       1 feature_gate.go:241] Setting GA feature gate Topology=true. It will be removed in a future release.
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:35.745501       1 feature_gate.go:249] feature gates: &{map[Topology:true]}
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:35.745533       1 csi-provisioner.go:154] Version: v3.6.1
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:35.745538       1 csi-provisioner.go:177] Building kube configs for running in cluster...
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: Started libcontainer container 7ae43a0ef5b5341660740defe1635dcc8793ebfb01c96cd80b333d1effb8ec26.
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:36.056164       1 main.go:108] Version: v8.1.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: Started libcontainer container fdadd0271e23b7193c02ae4a0c6f93556b1ab566a8b87be77536a18103313f17.
px-cnpg-cluster-1-tkpvj	time="2025-04-30T09:32:36Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:9001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:9001: connect: connection refused"
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:36.817893       1 main.go:108] "Version" version="v1.12.0"
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:36.817962       1 feature_gate.go:387] feature gates: {map[]}
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 systemd[1]: Started libcontainer container 32ef94a3a4fafed92faea2d4d69e80d50377023798e5b21091eb3e9ea2414ccc.
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:32:38.856Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:32:38Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 14 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-csi-ext-664b9dbf76-b2v8v	W0430 09:32:45.746820       1 connection.go:183] Still connecting to unix:///csi/csi.sock
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:46.058352       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:32:46.357Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:32:46Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 15 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-cnpg-cluster-1-tkpvj	time="2025-04-30T09:32:46Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:9001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:9001: connect: connection refused"
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:46.819547       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:32:53.860Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:32:53Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 16 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-csi-ext-664b9dbf76-b2v8v	W0430 09:32:55.746354       1 connection.go:183] Still connecting to unix:///csi/csi.sock
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:56.057359       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-cnpg-cluster-1-tkpvj	time="2025-04-30T09:32:56Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:9001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:9001: connect: connection refused"
px-csi-ext-664b9dbf76-b2v8v	I0430 09:32:56.819758       1 connection.go:253] "Still connecting" address="unix:///csi/csi.sock"
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: {"level":"warn","ts":"2025-04-30T09:33:01.361Z","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-5a2e911b-5e51-4538-a972-e6a858285c54/10.224.0
px-cnpg-cluster-1-tkpvj	@aks-portworx-39265852-vmss000004 portworx[862]: time="2025-04-30T09:33:01Z" level=error msg="[set: testConnection] kvdb error: rpc error: code = Unknown desc = context deadline exceeded, retry count 17 \n" file="kv_etcd.go:1793" component=kvdb/etcd/v3
px-csi-ext-664b9dbf76-b2v8v	W0430 09:33:05.746475       1 connection.go:183] Still connecting to unix:///csi/csi.sock
px-csi-ext-664b9dbf76-b2v8v	E0430 09:33:05.746541       1 csi-provisioner.go:215] context deadline exceeded

Topic		Replies	Views
On-prem install - stuck on initializing, http probe failed: 503, tcp connection refused Portworx Install install	1	698	August 28, 2023
Portworx Essential Operator 2.9 install on either OKD 4.7 or 4.8 - cluster pods not ready Portworx Install	1	814	January 31, 2022
Failed to load PX filesystem dependencies for kernel Portworx on Kubernetes install , operator	4	944	March 18, 2025
Installing Portworx on IBM Cloud OpenShift Bare Metal Portworx Install	3	709	February 27, 2021
Readiness probe failed Portworx on Kubernetes	6	2062	July 29, 2020

Readiness probe error: Get "http://127.0.0.1:17001/status": dial tcp 127.0.0.1:17001: connect: connection refused

Related topics