On-prem install - stuck on initializing, http probe failed: 503, tcp connection refused

riso_chras · August 24, 2023, 12:22pm

Dear all,

I am having the following issue and hope someone might offer some input on what might be the source those issues.
I am trying to install Portworx (Essentials) via the Operator on an OKD Cluster (4.13.0-0.okd-2023-08-18-135805) with 3 worker nodes.
The spec-file looks like this:

kind: StorageCluster
apiVersion: core.libopenstorage.org/v1
metadata:
  name: px-cluster-c-CLUSTER_ID
  namespace: portworx
  annotations:
    portworx.io/install-source: "https://install.portworx.com/?operator=true&mc=false&kbver=1.26.4&ns=portworx&oem=esse&user=4b170b06-xxxx-xxxx-xxxx-xxxxxxxxxxxx&b=true&iop=6&s=%22size%3D150%22&pureSanType=ISCSI&r=17001&c=px-cluster-c-CLUSTER_IDf&osft=true&stork=true&csi=true&mon=true&tel=true&st=k8s&promop=true"
    portworx.io/is-openshift: "true"
    portworx.io/misc-args: "--oem esse"
spec:
  image: portworx/oci-monitor:3.0.0
  imagePullPolicy: Always
  kvdb:
    internal: true
  cloudStorage:
    deviceSpecs:
    - size=150
  secretsProvider: k8s
  startPort: 17001
  stork:
    enabled: true
    args:
      webhook-controller: "true"
  autopilot:
    enabled: true
  runtimeOptions:
    default-io-profile: "6"
  csi:
    enabled: true
  monitoring:
    telemetry:
      enabled: true
    prometheus:
      enabled: true
      exportMetrics: true
  env:
  - name: PURE_FLASHARRAY_SAN_TYPE
    value: "ISCSI"
---
apiVersion: v1
kind: Secret
metadata:
  name: px-essential
  namespace: kube-system
data:
  px-essen-user-id: USER_ID_HASH
  px-osb-endpoint: ENDPOINT_HASH

Tried to create the storage cluster several times as I first tried using the version 2.13 of Portworx and also setting a custom data network interface. Changing these settings (individually) to 3.0 and auto didn’t affect the issues.
What happens after hitting “create cluster” - pods partially start and I see Readiness probe failed: HTTP probe failed with statuscode: 503 errors together with Readiness probe failed: HTTP probe failed: Get “http://127.0.0.1:17001/status”: dial tcp 127.0.0.1:17001: connect: connection refused . The connection refused errors stop appearing after 20-30 minutes.

Logs from one of the 3 PX-Cluster Pods contain the following warning/error level messages :

time="2023-08-24T11:09:44Z" level=error msg="Timeout running \"/bin/sh -c 'yum install -y nfs-utils rpcbind'\" command"
time="2023-08-24T11:09:44Z" level=error msg="Could not configure NFS service" error="Could not install NFS service: Command 'yum install -y nfs-utils rpcbind' failed: Timeout"
time="2023-08-24T11:09:30Z" level=warning msg="Detected invalid security context on host's /opt/pwx (attempting to fix it)" ls-out="drwxr-xr-x. 4 root root system_u:object_r:var_t:s0 28 Aug 24 09:38 /opt/pwx\n"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxctl to /usr/bin" error="symlink pxctl to /usr/bin/pxctl failed: symlink /var/opt/pwx/bin/pxctl /usr/bin/pxctl: read-only file system"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxc to /usr/local/bin" error="pxc command not found: stat /var/opt/pwx/bin/pxc: no such file or directory"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxc to /usr/bin" error="pxc command not found: stat /var/opt/pwx/bin/pxc: no such file or directory"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxc to /usr/local/bin" error="pxc command not found: stat /var/opt/pwx/bin/pxc: no such file or directory"
time="2023-08-24T11:09:45Z" level=warning msg="Reloading + Restarting portworx service"
time="2023-08-24T11:09:46Z" level=warning msg="Service portworx.service not yet active"
time="2023-08-24T11:10:59Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:10:59Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:09Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:09Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:19Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:19Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"

Any hints on what I might be doing wrong ?

riso_chras · August 28, 2023, 8:14am

Did some digging on the forum and found a thread from '20:

The suggested solution was manually starting the service and it :

1/ You stop the service
2/ You run the comand line twice : /var/opt/pwx/bin/px-runc run --name portworx --oci /var/opt/pwx/oci

Guess this will be needed on ech px worker node, gonna give it a try.
This also makes me think, might it be an issue related to Fedora Core OS ?

Topic		Replies	Views
Portworx Essential Operator 2.9 install on either OKD 4.7 or 4.8 - cluster pods not ready Portworx Install	1	811	January 31, 2022
Readiness probe error: Get "http://127.0.0.1:17001/status": dial tcp 127.0.0.1:17001: connect: connection refused Portworx Install install , operator	4	259	April 30, 2025
Portworx installation failed Portworx Essentials install	1	57	October 2, 2024
Portworx Essentials on OKD 4.15/Fedora CoreOS Enhancement Requests	1	58	July 11, 2024
Failed to load PX filesystem dependencies for kernel Portworx on Kubernetes install , operator	4	923	March 18, 2025

On-prem install - stuck on initializing, http probe failed: 503, tcp connection refused

Related topics