Portworx installation in EKS is stuck no logs no events

Tried to install portworx essentials 3.2 in AWS EKS (v1.28.15-eks-7f9249a). I ensured the per-requisites, the operator is just stuck in initializing. Restarting the operator deployment goes back to the same exact point. As such no other pods are created only the operator pod is there.

logs (they don’t even move, just stuck in exact point)

➜  ~ k logs -f portworx-operator-64bb76bbf7-rtpdd
time="27-11-2024 12:18:04" level=info msg="Starting openstorage operator version 24.1.3-d831f9cc" file="operator.go:125"
time="27-11-2024 12:18:04" level=info msg="Registering components" file="operator.go:167"
time="27-11-2024 12:18:04" level=info msg="Found namespaceNamespaceportworx" file="k8sutil.go:80"
time="27-11-2024 12:18:04" level=info msg="Found podnamePod.Nameportworx-operator-64bb76bbf7-rtpdd" file="k8sutil.go:127"
time="27-11-2024 12:18:04" level=info msg="Found PodPod.NamespaceportworxPod.Nameportworx-operator-64bb76bbf7-rtpdd" file="k8sutil.go:142"
time="27-11-2024 12:18:04" level=info msg="Pods owner foundKindDeploymentNameportworx-operatorNamespaceportworx" file="metrics.go:174"
time="27-11-2024 12:18:04" level=info msg="Metrics Service object updatedService.Nameportworx-operator-metricsService.Namespaceportworx" file="metrics.go:94"
time="27-11-2024 12:18:04" level=warning msg="Failed to create ServiceMonitor: no ServiceMonitor registered with the API" file="operator.go:235"
time="27-11-2024 12:18:04" level=info msg="cluster is running k8s distribution v1.28.15-eks-7f9249a" file="preflight.go:139"
time="27-11-2024 12:18:04" level=info msg="Migration is enabled" file="operator.go:268"
2024/11/27 12:18:04 DoRetryWithTimeout - Error: {the cache is not started, can not read objects}, Next try in [1s], timeout [30s]
I1127 12:18:04.929594       1 leaderelection.go:248] attempting to acquire leader lease portworx/openstorage-operator...
time="27-11-2024 12:18:05" level=info msg="Migration is not needed" file="migration.go:76"
I1127 12:18:21.565181       1 leaderelection.go:258] successfully acquired lease portworx/openstorage-operator

events, (involves events of me restarting the deployment)

➜  ~ k get events
LAST SEEN   TYPE     REASON              OBJECT                                    MESSAGE
31m         Normal   LeaderElection      configmap/openstorage-operator            portworx-operator-86dff4955-qxvhx_f5111ece-0896-4f32-ad2f-9eb97d1f2a00 became leader
31m         Normal   LeaderElection      lease/openstorage-operator                portworx-operator-86dff4955-qxvhx_f5111ece-0896-4f32-ad2f-9eb97d1f2a00 became leader
19m         Normal   LeaderElection      configmap/openstorage-operator            portworx-operator-64bb76bbf7-rtpdd_12cb21ed-52cc-47a0-81ba-ab8513ace288 became leader
19m         Normal   LeaderElection      lease/openstorage-operator                portworx-operator-64bb76bbf7-rtpdd_12cb21ed-52cc-47a0-81ba-ab8513ace288 became leader
5m53s       Normal   LeaderElection      configmap/openstorage-operator            portworx-operator-64bb76bbf7-rtpdd_6827e226-b601-424b-8f69-9219dbd91417 became leader
5m53s       Normal   LeaderElection      lease/openstorage-operator                portworx-operator-64bb76bbf7-rtpdd_6827e226-b601-424b-8f69-9219dbd91417 became leader
19m         Normal   Scheduled           pod/portworx-operator-64bb76bbf7-rtpdd    Successfully assigned portworx/portworx-operator-64bb76bbf7-rtpdd to ip-192-168-14-48.ec2.internal
6m10s       Normal   Pulling             pod/portworx-operator-64bb76bbf7-rtpdd    Pulling image "portworx/px-operator:24.1.3"
19m         Normal   Pulled              pod/portworx-operator-64bb76bbf7-rtpdd    Successfully pulled image "portworx/px-operator:24.1.3" in 6.319s (6.319s including waiting)
6m10s       Normal   Created             pod/portworx-operator-64bb76bbf7-rtpdd    Created container portworx-operator
6m10s       Normal   Started             pod/portworx-operator-64bb76bbf7-rtpdd    Started container portworx-operator
6m10s       Normal   Pulled              pod/portworx-operator-64bb76bbf7-rtpdd    Successfully pulled image "portworx/px-operator:24.1.3" in 119ms (119ms including waiting)
19m         Normal   SuccessfulCreate    replicaset/portworx-operator-64bb76bbf7   Created pod: portworx-operator-64bb76bbf7-rtpdd
32m         Normal   Scheduled           pod/portworx-operator-86dff4955-qxvhx     Successfully assigned portworx/portworx-operator-86dff4955-qxvhx to ip-192-168-8-77.ec2.internal
32m         Normal   Pulling             pod/portworx-operator-86dff4955-qxvhx     Pulling image "portworx/px-operator:24.1.3"
32m         Normal   Pulled              pod/portworx-operator-86dff4955-qxvhx     Successfully pulled image "portworx/px-operator:24.1.3" in 4.735s (4.735s including waiting)
32m         Normal   Created             pod/portworx-operator-86dff4955-qxvhx     Created container portworx-operator
32m         Normal   Started             pod/portworx-operator-86dff4955-qxvhx     Started container portworx-operator
19m         Normal   Killing             pod/portworx-operator-86dff4955-qxvhx     Stopping container portworx-operator
32m         Normal   SuccessfulCreate    replicaset/portworx-operator-86dff4955    Created pod: portworx-operator-86dff4955-qxvhx
19m         Normal   SuccessfulDelete    replicaset/portworx-operator-86dff4955    Deleted pod: portworx-operator-86dff4955-qxvhx
32m         Normal   ScalingReplicaSet   deployment/portworx-operator              Scaled up replica set portworx-operator-86dff4955 to 1
19m         Normal   ScalingReplicaSet   deployment/portworx-operator              Scaled up replica set portworx-operator-64bb76bbf7 to 1
19m         Normal   ScalingReplicaSet   deployment/portworx-operator              Scaled down replica set portworx-operator-86dff4955 to 0 from 1

pod (the restart 1 in there is me killing the operator manually as, kill 1 - hoping it would get unstuck - but no luck)

➜  ~ k get pods
NAME                                 READY   STATUS    RESTARTS        AGE
portworx-operator-64bb76bbf7-rtpdd   1/1     Running   1 (6m52s ago)   20m

The problem as the spec file applying was failed at creating - storagecluster.core.libopenstorage.org/px-cluster-cb1f4fd8-e0d8-4fcf-9c6f-b1fb4fe33f55 for the first time.

It’s already working now. All good. my bad, sorry :pray: