Issues with on-prem installation

David_Curran · November 17, 2021, 1:43pm

Pure essentials version: 2.8
Kubernetes version: 1.22.3 (built using kubeadm)

When I run the installation of Portworx essentials using the operator not all the pods are starting correctly. Can you advise on this?

e.g. the potworx-api pods are in state running but have 0 running containers

portworx-api-bnr88                                      0/1     Running                 0                 42h
portworx-api-qjhjw                                      0/1     Running                 0                 42h

I’m not able to view any logs from these pods

px-cluster is similarly missing a running container

px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-bkpbn   1/2     Running                 164 (57s ago)     42h
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-q8jmc   1/2     Running                 164 (31s ago)     42h

and lighthouse is failing to init as it can’t access portworx-service (presumably because the API isn’t up)

kubectl -n kube-system logs px-lighthouse-656b55cfdd-htc6c -c config-init -p
time="2021-11-17T08:10:12Z" level=info msg="Creating new default config for lighthouse"
2021/11/17 08:10:42 Get http://portworx-service:9001/config: dial tcp 10.106.131.105:9001: i/o timeout Next retry in: 10s
time="2021-11-17T08:10:52Z" level=info msg="Creating new default config for lighthouse"
2021/11/17 08:11:22 Get http://portworx-service:9001/config: dial tcp 10.106.131.105:9001: i/o timeout Next retry in: 10s
time="2021-11-17T08:11:32Z" level=info msg="Creating new default config for lighthouse"
2021/11/17 08:12:02 Get http://portworx-service:9001/config: dial tcp 10.106.131.105:9001: i/o timeout Next retry in: 10s
time="2021-11-17T08:12:12Z" level=fatal msg="Error initializing lighthouse config. timed out performing task"

Logs from the controller pod show that it cannot connect to portworx-service as well

kubectl -n kube-system logs px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-bkpbn -c csi-node-driver-registrar
...{repeated every few secs}
W1117 13:36:23.403311       1 connection.go:172] Still connecting to unix:///csi/csi.sock

output of the portworx container is large so I put it in pastebin portworx.log - Pastebin.com

David_Curran · November 25, 2021, 4:13pm

I think the issue comes down to the csi-node-driver-registrar container not being able to connect to the csi socket

I1125 16:07:28.538410       1 main.go:137] Attempting to open a gRPC connection with: "/csi/csi.sock"
I1125 16:07:28.538448       1 connection.go:153] Connecting to unix:///csi/csi.sock
W1125 16:07:38.538609       1 connection.go:172] Still connecting to unix:///csi/csi.sock

What can cause this?

David_Curran · November 25, 2021, 4:32pm

Updated to version 2.9 but no change. Still seeing pods in “Running” state with failing containers

portworx-api-6b6bw                                      0/1     Running    0              112s
portworx-api-d9m5x                                      0/1     Running    0              89s
portworx-api-pv8gd                                      0/1     Running    0              89s
portworx-operator-84488c55c5-5mhzn                      1/1     Running    0              10d
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-gslgv   1/2     Running    0              57s
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-j2hjn   1/2     Running    0              84s
px-cluster-e3249027-e17e-4cd6-bba2-48690ca7326f-wfqm9   1/2     Running    0              57s
px-csi-ext-5fb4cc4bff-ld5pk                             3/3     Running    0              2m2s
px-csi-ext-5fb4cc4bff-w6t2n                             3/3     Running    0              99s
px-csi-ext-5fb4cc4bff-xc6kd                             3/3     Running    0              99s
px-lighthouse-656b55cfdd-jphgm                          0/3     Init:0/1   1 (38s ago)    2m49s
stork-969ff57d5-htq67                                   1/1     Running    0              3m42s
stork-969ff57d5-hwdht                                   1/1     Running    0              3m19s
stork-969ff57d5-wmp6c                                   1/1     Running    0              3m19s
stork-scheduler-8699d795f9-8sm5j                        1/1     Running    0              3m10s
stork-scheduler-8699d795f9-hlpgw                        1/1     Running    0              3m33s
stork-scheduler-8699d795f9-xr9jl                        1/1     Running    0              3m10s

still get the warning about connecting to csi socket

I don’t see the socket open on the node when using ss -x | grep csi is there some other part the needs installing on the node?

David_Curran · November 26, 2021, 12:02pm

open-iscsi and multipath-tools are up to date

~$ sudo apt-cache policy open-iscsi
open-iscsi:
  Installed: 2.0.874-7.1ubuntu6.2
  Candidate: 2.0.874-7.1ubuntu6.2
  Version table:
 *** 2.0.874-7.1ubuntu6.2 500
        500 http://gb.archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.874-7.1ubuntu6 500
        500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages

~$ sudo apt-cache policy multipath-tools
multipath-tools:
  Installed: 0.8.3-1ubuntu2
  Candidate: 0.8.3-1ubuntu2
  Version table:
 *** 0.8.3-1ubuntu2 500
        500 http://gb.archive.ubuntu.com/ubuntu focal/main amd64 Packages
        100 /var/lib/dpkg/status

David_Curran · November 26, 2021, 2:31pm

events from storagecluster object

Events:
  Type     Reason            Age                   From                       Message
  ----     ------            ----                  ----                       -------
  Normal   SuccessfulCreate  20m                   storagecluster-controller  Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-zmqql
  Normal   SuccessfulCreate  20m                   storagecluster-controller  Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-4sgkw
  Normal   SuccessfulCreate  20m                   storagecluster-controller  Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-9mblx
  Normal   SuccessfulCreate  11m                   storagecluster-controller  Created pod: px-cluster-c41aa9d0-1398-415c-b3cc-5de9cc451fe8-6t4kr
  Warning  FailedComponent   6m28s (x13 over 20m)  storagecluster-controller  Failed to setup Monitoring. Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": dial tcp 10.106.178.125:443: i/o timeout
  Warning  FailedComponent   81s (x35 over 21m)    storagecluster-controller  Failed to setup Monitoring. Internal error occurred: failed calling webhook "prometheusrulemutate.monitoring.coreos.com": failed to call webhook: Post "https://prometheus-operator-operator.monitoring.svc:443/admission-prometheusrules/mutate?timeout=10s": context deadline exceeded

which is odd as I have spec.monitoring.prometheus.enabled: false set in my operator spec.

Two warnings on the px-cluster pods:

  Warning  Unhealthy                          3m19s (x72 over 13m)  kubelet   Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  NodeStartFailure                   25s (x9 over 11m)     portworx  Failed to start Portworx: error loading node identity: Cause: ProviderInternal Error: failed to attach volume 8be366ddb6: volume does not exist

complaining the volume doesn’t exist. what should create the volume and how?

Topic		Replies	Views
PX services not running on Pods deployed from portworx operator on openshift 4.2 Portworx on Kubernetes	0	1362	February 12, 2020
Portworx Essential Operator 2.9 install on either OKD 4.7 or 4.8 - cluster pods not ready Portworx Install	1	817	January 31, 2022
Portworx on Openshift 4.5 Portworx on Kubernetes install	10	1234	November 20, 2020
Portworx pod fails to come up after installation using Daemon set on on-prem k8s cluster Portworx Install install	6	2032	February 7, 2022
Install on centos 7 report error Portworx on Kubernetes install	1	603	April 9, 2021

Issues with on-prem installation

Related topics