Portworx unaware of FCOS constraints in OKD

I tried to install Portworx Essential on-site for a customer this morning. Customer want’s to try Essential and mybe switch to enterprise later.

We have a OKD 4.5 Cluster, tried to use the operator method for installation of Portworx v2.6.1.5.

px-cluster-* Pods are not becoming ready. I see in the logs several issues, related to portworx trying to do stuff not available on FCOS:

time=“2020-11-19T10:31:16Z” level=warning msg=“Detected invalid security context on host’s /opt/pwx (attempting to fix it)” ls-out=“drwxr-xr-x. 4 root root system_u:object_r:var_t:s0 40 Nov 19 10:01 /opt/pwx\n”
time=“2020-11-19T10:31:16Z” level=info msg="> run-host: /bin/sh -c d=$(readlink -m /opt/pwx) ; semanage fcontext -a -t usr_t $d’(/.*)?’ ; restorecon -Rv $d/"

/bin/sh: semanage: command not found

time=“2020-11-19T10:31:26Z” level=info msg="> run-host: /bin/sh -c ‘yum clean all && exec yum makecache’"
/bin/sh: yum: command not found
time=“2020-11-19T10:31:26Z” level=error msg=“Error running “/bin/sh -c ‘yum clean all && exec yum makecache’” command” error=“exit status 127”
time=“2020-11-19T10:31:26Z” level=error msg=“Could not enable NFS service” error=“Could not install NFS service: Command ‘yum clean all && exec yum makecache’ failed: exit status 127”

There’s nowhere written that OKD/FCOS is not supported.

We do support OKD, however newer OKD/OpenShift releases occasionally have slight changes that are sometimes missed in testing and this might be what happened here, can you share what version you’re deploying (please include the oci-monitor image tag of the pods that were deployed by the operator) - we will update this post shortly, with more detail when we can confirm version will be necessary for this support.

Hi @aleks

OKD Version: 4.5.0-0.okd-2020-10-15-235428
Operator Version: portworx-essentials.v1.4.2

Pod Images:
docker.io/portworx/autopilot@sha256:0235cdeb40085827f92e435447568a8fd89cd0392399a88388293eadf0356953 docker.io/portworx/px-operator@sha256:1692041ab0b0920e9f4b6096e23b2b56a8ec8f077deb3d797755646b56ddb5bd quay.io/prometheus/prometheus@sha256:580d5812b20d4f49e1834b05d99a070189688f974ea45f5e1ecb9f773fb3e261 quay.io/coreos/prometheus-config-reloader@sha256:30935f4562227031949ba780eaa4566373524a1cc8450291db1122f73231cd46 quay.io/coreos/configmap-reload@sha256:a5b867760354219426b64fcdf1f7396e1625fbc05d1c336ed9836ab9c78edeb6 docker.io/portworx/oci-monitor@sha256:d06819854f41bfb9dc44c2c2d2c2915d42525c959e810914cc9e38b8702a8dbf quay.io/k8scsi/csi-node-driver-registrar@sha256:13daf82fb99e951a4bff8ae5fc7c17c3a8fe7130be6400990d8f6076c32d4599 docker.io/portworx/oci-monitor@sha256:d06819854f41bfb9dc44c2c2d2c2915d42525c959e810914cc9e38b8702a8dbf quay.io/k8scsi/csi-node-driver-registrar@sha256:13daf82fb99e951a4bff8ae5fc7c17c3a8fe7130be6400990d8f6076c32d4599 quay.io/openstorage/csi-provisioner@sha256:3d103b45cb896603890cf05b1727397d64e051eb78474d074bfac4a45fa3c597 quay.io/k8scsi/csi-snapshotter@sha256:35ead85dd09aa8cc612fdb598d4e0e2f048bef816f1b74df5eeab67cd21b10aa quay.io/k8scsi/csi-resizer@sha256:6c6a0332693a7c456378f6abd2bb40611826c1e1a733cadbdae2daab3125b71c quay.io/openstorage/csi-provisioner@sha256:3d103b45cb896603890cf05b1727397d64e051eb78474d074bfac4a45fa3c597 quay.io/k8scsi/csi-snapshotter@sha256:35ead85dd09aa8cc612fdb598d4e0e2f048bef816f1b74df5eeab67cd21b10aa quay.io/k8scsi/csi-resizer@sha256:6c6a0332693a7c456378f6abd2bb40611826c1e1a733cadbdae2daab3125b71c quay.io/openstorage/csi-provisioner@sha256:3d103b45cb896603890cf05b1727397d64e051eb78474d074bfac4a45fa3c597 quay.io/k8scsi/csi-snapshotter@sha256:35ead85dd09aa8cc612fdb598d4e0e2f048bef816f1b74df5eeab67cd21b10aa quay.io/k8scsi/csi-resizer@sha256:6c6a0332693a7c456378f6abd2bb40611826c1e1a733cadbdae2daab3125b71c docker.io/portworx/lh-config-sync@sha256:5364ced9b47ddadc226453f7845ca7c735a438af7c221c88ea32f1f4b185bb9e quay.io/coreos/prometheus-operator@sha256:c0bcb231fe67cd11fd26f7adf5ac1080dfac189ac94705538bd4ab7dd99a98a9 docker.io/openstorage/stork@sha256:6c4d85f7d274afb5828083dd8a6d3538fcb5f7fa8be5e9552e2e8c79ed41096b docker.io/openstorage/stork@sha256:6c4d85f7d274afb5828083dd8a6d3538fcb5f7fa8be5e9552e2e8c79ed41096b

I hope I catched all

Hi @aleks

Looks like that forum here eat my yesterdays reply, maybe due to the image names.

However, here’s a List of the image IDs:

OKD Version is 4.5.0-0.okd-2020-10-15-235428
Operator: portworx-essentials.v1.4.2

Kind regards
Thomas

Thanks for that info, they include digests and we’d need to examine/look up those digests to version numbers - in the mean time can you also paste the output of your storagecluster CRD? oc get storagecluster -A -o yaml this should include how the operator is installing Portworx (along with versions).

@aleks Here we go:

@aleks any idea what’s going wrong?

Apologies for the delay, we’re discussing internally how to get you unblocked, as our main focus has been openshift testing, it’s unclear that/why the OKD set up of the same version does not appear to be setup/work the same way…

Hi @aleks

Thanks for the feedback!

Wild guess: Could it be that you are checking during setup if the node is RHEL or RedHat CoreOS? Because in OKD, there ist CentOS or Fedora CoreOS. Without any idea about your code, I guess you are checking if it’s RHCOS or not. This check says “no” (because its FCOS), so you are fallback to the RHEL stuff. But on FCOS, there are no yum and semanage etc. same as on RHCOS.

Hello tdeutsch,

Looks like CentOS/Fedora version of CoreOS copied from RedHat CoreOS, but removed tools/packages that are required to fix and run Portworx:

  • the /opt/ directory on RedHat CoreOS is a soft-link to /var/opt, but it has a wrong SELinux label on it (it should be system_u:object_r:usr_t:s0 instead of ...var_t:s0 – one can confirm this by checking the SELinux labels on RHEL/CentOS server – ls -alZ /var/ /opt/)
    • the Portworx install will attempt to fix this invalid SELinux label automatically, but it’s failing since “semanage” command looks to be stripped out of the CentOS/Fedora CoreOS distribution
  • the yum makecache ; yum install nfs-utils rpcbind commands get triggered when Portworx finds that nfs-server.service is not running on the host, so Portworx install will attempt to install the NFS packages, so that it can enable the “Shared V4” volume support
    • so to fix this, one should either use -disable-sharedv4 option (i.e. add portworx.io/misc-args: "-disable-sharedv4" annotation to StorageCluster), or ensure that NFS service is installed and running on the host