Error installing in Openshift Codeready Containers

I do not have access to a standard cluster, so trying to leverage Openshift Codeready Containers, follow-on to Minishift, where Openshift runs a cluster in a VM. Specifically, a virt-manager KVM VM on Ubuntu (Pop!_OS) 20.04 LTS.

I am really trying to recreate the steps in https://portworx.com/run-ha-sql-server-red-hat-openshift/ for an HA SQL Server. All goes well until I apply the StorageCluster yaml (oc apply -f px-spec.yaml), afterwhich no portworx pods start. The error event for the StorageCluster is:

Type Reason Age From Message


Warning FailedSync 25s (x7 over 6m25s) storagecluster-controller error connecting to GRPC server [172.25.98.51:9020]: Connection timed out

Not sure what it wants, or if this is even possible in Codeready Containers, but appreciate if anyone has any insight.

Thanks!

Of the two of the most common issues we’ve seen encountered, the first one it seems from your log entry you may be already seeing, is that network ports 9001-9022 via TCP (as well as 9002 via UDP) must to be reachable between each of the nodes running Portworx. It’s likely the iptables firewall that CRC is setting up does not include these ports, resulting in traffic being blocked. You’ll need to make sure these CRC sets up to be open these ports between nodes. (These change as of OpenShift 4.3 to begin with 17001 rather than 9001).

Secondly, the next issue you may want to watch out for is that Portworx requires unused block devices to be present on each Portworx node for it to be able to come properly in normal mode (otherwise it starts in storageless node). This allows it to form the StorageCluster which allows the storage overlay services it is designed with (and we set up a separate dedicated etcd cluster for kvdb purposes, these won’t include storageless nodes). I don’t know much about CRC yet to know if it is configured with these block devices this way, or if not, what needs to be done for them to be added, but can look into it more if needed.

For now, please make sure the minimum needed resources are available, per our documentation available here. Once these needs are met, you’d typically go to the OpenShift web interface, go to Operators, install the Portworx operator and get a StorageCluster spec from install.portworx.com.

Thank you for the reply. The CRC I am running contains OpenShift 4.5.4.

CRC is a single node cluster in a VM:

$ oc get nodes
NAME                 STATUS   ROLES                   AGE   VERSION
crc-fd5nx-master-0   Ready    compute,master,worker   9d    v1.18.3+012b3ec

The node is running RHEL CoreOS:

[root@crc-fd5nx-master-0 core]# cat /etc/redhat-release 
Red Hat Enterprise Linux CoreOS release 4.5

I added rules for the ports you mention:

iptables -A INPUT -p tcp --dport 9001:9022 -j ACCEPT
iptables -A INPUT -p udp --dport 9002 -j ACCEPT

I also see rules against the IP in question apparently added by your operator:

[root@crc-fd5nx-master-0 core]# iptables -L|grep 172.25.98.51
REJECT     tcp  --  anywhere             172.25.98.51         /* kube-system/portworx-service:px-rest-gateway has no endpoints */ tcp dpt:panagolin-ident reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             172.25.98.51         /* kube-system/portworx-service:px-api has no endpoints */ tcp dpt:etlservicemgr reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             172.25.98.51         /* kube-system/portworx-service:px-sdk has no endpoints */ tcp dpt:tambora reject-with icmp-port-unreachable
REJECT     tcp  --  anywhere             172.25.98.51         /* kube-system/portworx-service:px-kvdb has no endpoints */ tcp dpt:9019 reject-with icmp-port-unreachable

Also, the raw block device is attached to the VM as /dev/vdb:

[root@crc-fd5nx-master-0 core]# lsblk
NAME                         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                          252:0    0   31G  0 disk 
├─vda1                       252:1    0  384M  0 part /boot
├─vda2                       252:2    0  127M  0 part /boot/efi
├─vda3                       252:3    0    1M  0 part 
└─vda4                       252:4    0 30.5G  0 part 
  └─coreos-luks-root-nocrypt 253:0    0 30.5G  0 dm   /sysroot
vdb                          252:16   0   20G  0 disk

In any case, I still see the same failure:

Warning FailedSync 3m23s (x46 over 48m) storagecluster-controller error connecting to GRPC server [172.25.98.51:9020]: Connection timed out

I suspect the CRC node being a master as well has a taint to disallow running pods. Also by default if you do not give any placement constraints, the portworx operator will exclude master/infra nodes. That’s because generally Portworx doesn’t run on master nodes as there are no apps running.

Can you paste the output for the following

  • spec.taints from the openshift node
  • oc get storagecluster -oyaml
  • oc get pods -n <storagecluster_namespace>

If the CRC node has a taint, you can add toleration in the StorageCluster spec (spec.placement.tolerations) to tolerate the master taint. Otherwise changing the StorageCluster placement should work too (spec.placement.nodeAffinity)

$ oc describe node crc-fd5nx-master-0 | grep Taint
Taints:             <none>

$ oc get StorageCluster -n kube-system
NAME                                              CLUSTER UUID   STATUS         VERSION   AGE
px-cluster-7afdf222-6e13-44ee-bd54-e5493f54bff3                  Initializing   2.5.5     4d4h
elfner@maximus [/home/elfner/crc]
$ oc get StorageCluster -n kube-system -oyaml
apiVersion: v1
items:
- apiVersion: core.libopenstorage.org/v1alpha1
  kind: StorageCluster
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"core.libopenstorage.org/v1alpha1","kind":"StorageCluster","metadata":{"annotations":{"portworx.io/install-source":"https://install.portworx.com/?mc=false\u0026kbver=1.18.3\u0026oem=esse\u0026user=49e58b34-d6ca-11ea-a2c5-c24e499c7467\u0026b=true\u0026s=%2Fdev%2Fvdb\u0026c=px-cluster-7afdf222-6e13-44ee-bd54-e5493f54bff3\u0026osft=true\u0026operator=true\u0026stork=true\u0026lh=true\u0026st=k8s\u0026rsec=regcred","portworx.io/is-openshift":"true","portworx.io/misc-args":"--oem esse"},"name":"px-cluster-7afdf222-6e13-44ee-bd54-e5493f54bff3","namespace":"kube-system"},"spec":{"autopilot":{"enabled":true,"image":"portworx/autopilot:1.2.1","providers":[{"name":"default","params":{"url":"http://prometheus:9090"},"type":"prometheus"}]},"deleteStrategy":{"type":"UninstallAndWipe"},"image":"portworx/oci-monitor:2.5.5","imagePullPolicy":"Always","imagePullSecret":"regcred","kvdb":{"internal":true},"secretsProvider":"k8s","storage":{"devices":["/dev/vdb"]},"stork":{"enabled":true,"image":"openstorage/stork:2.4.3"},"userInterface":{"enabled":true,"image":"portworx/px-lighthouse:2.0.7"}}}
      portworx.io/install-source: https://install.portworx.com/?mc=false&kbver=1.18.3&oem=esse&user=49e58b34-d6ca-11ea-a2c5-c24e499c7467&b=true&s=%2Fdev%2Fvdb&c=px-cluster-7afdf222-6e13-44ee-bd54-e5493f54bff3&osft=true&operator=true&stork=true&lh=true&st=k8s&rsec=regcred
      portworx.io/is-openshift: "true"
      portworx.io/misc-args: --oem esse
    creationTimestamp: "2020-08-06T22:10:40Z"
    finalizers:
    - operator.libopenstorage.org/delete
    generation: 2
    name: px-cluster-7afdf222-6e13-44ee-bd54-e5493f54bff3
    namespace: kube-system
    resourceVersion: "416119"
    selfLink: /apis/core.libopenstorage.org/v1alpha1/namespaces/kube-system/storageclusters/px-cluster-7afdf222-6e13-44ee-bd54-e5493f54bff3
    uid: 0ffc4379-7591-441d-81de-6f7daf05f6c6
  spec:
    autopilot:
      enabled: true
      image: portworx/autopilot:1.2.1
      providers:
      - name: default
        params:
          url: http://prometheus:9090
        type: prometheus
    deleteStrategy:
      type: UninstallAndWipe
    image: portworx/oci-monitor:2.5.5
    imagePullPolicy: Always
    imagePullSecret: regcred
    kvdb:
      internal: true
    placement:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: px/enabled
              operator: NotIn
              values:
              - "false"
            - key: node-role.kubernetes.io/infra
              operator: DoesNotExist
            - key: node-role.kubernetes.io/master
              operator: DoesNotExist
    revisionHistoryLimit: 10
    secretsProvider: k8s
    startPort: 17001
    storage:
      devices:
      - /dev/vdb
    stork:
      enabled: true
      image: openstorage/stork:2.4.3
    updateStrategy:
      rollingUpdate:
        maxUnavailable: 1
      type: RollingUpdate
    userInterface:
      enabled: true
      image: portworx/px-lighthouse:2.0.7
    version: 2.5.5
  status:
    clusterName: px-cluster-7afdf222-6e13-44ee-bd54-e5493f54bff3
    phase: Initializing
    storage: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

$ oc get pods -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
autopilot-85779dc5cc-9p8ls         0/1     Pending   0          4d4h
px-lighthouse-57477fbb7-jvln6      0/3     Pending   0          4d4h
stork-6fdb74cb88-7d7rt             0/1     Pending   0          4d4h
stork-6fdb74cb88-g6h66             0/1     Pending   0          4d4h
stork-6fdb74cb88-pj9dc             0/1     Pending   0          4d4h
stork-scheduler-6c5987c55d-2z99m   0/1     Pending   0          4d4h
stork-scheduler-6c5987c55d-mp5bn   0/1     Pending   0          4d4h
stork-scheduler-6c5987c55d-rsvr9   0/1     Pending   0          4d4h

Looks like the node does not have any taints.

Can you change the StorageCluster spec’s placement as below -

  spec:
    placement:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: px/enabled
              operator: NotIn
              values:
              - "false"

Basically remove the default constraint to not run on master/infra nodes.

Yes, that allowed the portworx pod to start.

Thank you very much for all your help.