Pure essentials version: 2.8
Kubernetes version: 1.22.3 (built using kubeadm)
When I run the installation of Portworx essentials using the operator not all the pods are starting correctly. Can you advise on this?
e.g. the potworx-api pods are in state running but have 0 running containers
portworx-api-bnr88 0/1 Running 0 42h
portworx-api-qjhjw 0/1 Running 0 42h
I’m not able to view any logs from these pods
px-cluster is similarly missing a running container
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-bkpbn 1/2 Running 164 (57s ago) 42h
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-q8jmc 1/2 Running 164 (31s ago) 42h
and lighthouse is failing to init as it can’t access portworx-service (presumably because the API isn’t up)
kubectl -n kube-system logs px-lighthouse-656b55cfdd-htc6c -c config-init -p
time="2021-11-17T08:10:12Z" level=info msg="Creating new default config for lighthouse"
2021/11/17 08:10:42 Get http://portworx-service:9001/config: dial tcp 10.106.131.105:9001: i/o timeout Next retry in: 10s
time="2021-11-17T08:10:52Z" level=info msg="Creating new default config for lighthouse"
2021/11/17 08:11:22 Get http://portworx-service:9001/config: dial tcp 10.106.131.105:9001: i/o timeout Next retry in: 10s
time="2021-11-17T08:11:32Z" level=info msg="Creating new default config for lighthouse"
2021/11/17 08:12:02 Get http://portworx-service:9001/config: dial tcp 10.106.131.105:9001: i/o timeout Next retry in: 10s
time="2021-11-17T08:12:12Z" level=fatal msg="Error initializing lighthouse config. timed out performing task"
Logs from the controller pod show that it cannot connect to portworx-service as well
kubectl -n kube-system logs px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-bkpbn -c csi-node-driver-registrar
...{repeated every few secs}
W1117 13:36:23.403311 1 connection.go:172] Still connecting to unix:///csi/csi.sock
output of the portworx container is large so I put it in pastebin portworx.log - Pastebin.com
I think the issue comes down to the csi-node-driver-registrar container not being able to connect to the csi socket
I1125 16:07:28.538410 1 main.go:137] Attempting to open a gRPC connection with: "/csi/csi.sock"
I1125 16:07:28.538448 1 connection.go:153] Connecting to unix:///csi/csi.sock
W1125 16:07:38.538609 1 connection.go:172] Still connecting to unix:///csi/csi.sock
What can cause this?
Updated to version 2.9 but no change. Still seeing pods in “Running” state with failing containers
portworx-api-6b6bw 0/1 Running 0 112s
portworx-api-d9m5x 0/1 Running 0 89s
portworx-api-pv8gd 0/1 Running 0 89s
portworx-operator-84488c55c5-5mhzn 1/1 Running 0 10d
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-gslgv 1/2 Running 0 57s
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-j2hjn 1/2 Running 0 84s
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-wfqm9 1/2 Running 0 57s
px-csi-ext-5fb4cc4bff-ld5pk 3/3 Running 0 2m2s
px-csi-ext-5fb4cc4bff-w6t2n 3/3 Running 0 99s
px-csi-ext-5fb4cc4bff-xc6kd 3/3 Running 0 99s
px-lighthouse-656b55cfdd-jphgm 0/3 Init:0/1 1 (38s ago) 2m49s
stork-969ff57d5-htq67 1/1 Running 0 3m42s
stork-969ff57d5-hwdht 1/1 Running 0 3m19s
stork-969ff57d5-wmp6c 1/1 Running 0 3m19s
stork-scheduler-8699d795f9-8sm5j 1/1 Running 0 3m10s
stork-scheduler-8699d795f9-hlpgw 1/1 Running 0 3m33s
stork-scheduler-8699d795f9-xr9jl 1/1 Running 0 3m10s
still get the warning about connecting to csi socket
I don’t see the socket open on the node when using ss -x | grep csi
is there some other part the needs installing on the node?
open-iscsi and multipath-tools are up to date
~$ sudo apt-cache policy open-iscsi
open-iscsi:
Installed: 2.0.874-7.1ubuntu6.2
Candidate: 2.0.874-7.1ubuntu6.2
Version table:
*** 2.0.874-7.1ubuntu6.2 500
500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
100 /var/lib/dpkg/status
2.0.874-7.1ubuntu6 500
500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages
~$ sudo apt-cache policy multipath-tools
multipath-tools:
Installed: 0.8.3-1ubuntu2
Candidate: 0.8.3-1ubuntu2
Version table:
*** 0.8.3-1ubuntu2 500
500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages
100 /var/lib/dpkg/status
events from storagecluster object
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 20m storagecluster-controller Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-zmqql
Normal SuccessfulCreate 20m storagecluster-controller Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-4sgkw
Normal SuccessfulCreate 20m storagecluster-controller Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-9mblx
Normal SuccessfulCreate 11m storagecluster-controller Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-6t4kr
Warning FailedComponent 6m28s (x13 over 20m) storagecluster-controller Failed to setup Monitoring. Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.106.178.125:443: i/o timeout
Warning FailedComponent 81s (x35 over 21m) storagecluster-controller Failed to setup Monitoring. Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": context deadline exceeded
which is odd as I have spec.monitoring.prometheus.enabled: false
set in my operator spec.
Two warnings on the px-cluster pods:
Warning Unhealthy 3m19s (x72 over 13m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Warning NodeStartFailure 25s (x9 over 11m) portworx Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: failed to attach volume 8be366ddb6: volume does not exist
complaining the volume doesn’t exist. what should create the volume and how?