Dear all,
I am having the following issue and hope someone might offer some input on what might be the source those issues.
I am trying to install Portworx (Essentials) via the Operator on an OKD Cluster (4.13.0-0.okd-2023-08-18-135805) with 3 worker nodes.
The spec-file looks like this:
kind: StorageCluster
apiVersion: core.libopenstorage.org/v1
metadata:
name: px-cluster-c-CLUSTER_ID
namespace: portworx
annotations:
portworx.io/install-source: "https://install.portworx.com/?operator=true&mc=false&kbver=1.26.4&ns=portworx&oem=esse&user=4b170b06-xxxx-xxxx-xxxx-xxxxxxxxxxxx&b=true&iop=6&s=%22size%3D150%22&pureSanType=ISCSI&r=17001&c=px-cluster-c-CLUSTER_IDf&osft=true&stork=true&csi=true&mon=true&tel=true&st=k8s&promop=true"
portworx.io/is-openshift: "true"
portworx.io/misc-args: "--oem esse"
spec:
image: portworx/oci-monitor:3.0.0
imagePullPolicy: Always
kvdb:
internal: true
cloudStorage:
deviceSpecs:
- size=150
secretsProvider: k8s
startPort: 17001
stork:
enabled: true
args:
webhook-controller: "true"
autopilot:
enabled: true
runtimeOptions:
default-io-profile: "6"
csi:
enabled: true
monitoring:
telemetry:
enabled: true
prometheus:
enabled: true
exportMetrics: true
env:
- name: PURE_FLASHARRAY_SAN_TYPE
value: "ISCSI"
---
apiVersion: v1
kind: Secret
metadata:
name: px-essential
namespace: kube-system
data:
px-essen-user-id: USER_ID_HASH
px-osb-endpoint: ENDPOINT_HASH
Tried to create the storage cluster several times as I first tried using the version 2.13 of Portworx and also setting a custom data network interface. Changing these settings (individually) to 3.0 and auto didn’t affect the issues.
What happens after hitting “create cluster” - pods partially start and I see Readiness probe failed: HTTP probe failed with statuscode: 503 errors together with Readiness probe failed: HTTP probe failed: Get “http://127.0.0.1:17001/status”: dial tcp 127.0.0.1:17001: connect: connection refused . The connection refused errors stop appearing after 20-30 minutes.
Logs from one of the 3 PX-Cluster Pods contain the following warning/error level messages :
time="2023-08-24T11:09:44Z" level=error msg="Timeout running \"/bin/sh -c 'yum install -y nfs-utils rpcbind'\" command"
time="2023-08-24T11:09:44Z" level=error msg="Could not configure NFS service" error="Could not install NFS service: Command 'yum install -y nfs-utils rpcbind' failed: Timeout"
time="2023-08-24T11:09:30Z" level=warning msg="Detected invalid security context on host's /opt/pwx (attempting to fix it)" ls-out="drwxr-xr-x. 4 root root system_u:object_r:var_t:s0 28 Aug 24 09:38 /opt/pwx\n"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxctl to /usr/bin" error="symlink pxctl to /usr/bin/pxctl failed: symlink /var/opt/pwx/bin/pxctl /usr/bin/pxctl: read-only file system"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxc to /usr/local/bin" error="pxc command not found: stat /var/opt/pwx/bin/pxc: no such file or directory"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxc to /usr/bin" error="pxc command not found: stat /var/opt/pwx/bin/pxc: no such file or directory"
time="2023-08-24T11:09:44Z" level=warning msg="Could not link /var/opt/pwx/bin/pxc to /usr/local/bin" error="pxc command not found: stat /var/opt/pwx/bin/pxc: no such file or directory"
time="2023-08-24T11:09:45Z" level=warning msg="Reloading + Restarting portworx service"
time="2023-08-24T11:09:46Z" level=warning msg="Service portworx.service not yet active"
time="2023-08-24T11:10:59Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:10:59Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:09Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:09Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:19Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
time="2023-08-24T11:11:19Z" level=warning msg="Could not retrieve PX node status" error="Get \"http://127.0.0.1:17001/v1/cluster/nodehealth\": dial tcp 127.0.0.1:17001: connect: connection refused"
Any hints on what I might be doing wrong ?