I have been trying to install Portworx on on-prem kubernetes cluster. I was able to generate the spec file from px-central console. I applied the file against my kubernetes cluster but the portworx-api pods fail to come up. I then tried to collect the logs for Portworx using the command:
kubectl logs -n kube-system -l name=portworx -c portworx --tail=99999
listed here: Troubleshoot Portworx on Kubernetes but the command fails with the error: Error from server: Get "https://172.23.105.137:10250/containerLogs/kube-system/portworx-5mxmj/portworx?tailLines=99999": dial tcp 172.23.105.137:10250: connect: no route to host.
Env:
K8s version: 1.21.7
Base Machine: centos7
Portworx version: 2.8
Process of Installing Portorx on kubernetes:
- Generated Spec file from px-central, did not specify etcd or kvdb values, all params with default values.
- Applied the px-spec file over the k8s cluster.
- Status of the portworx pods in kube-system:
portworx-4zrk9 1/2 Running 86 22h
portworx-5mxmj 1/2 Running 86 22h
portworx-api-2kvkt 0/1 Running 0 22h
portworx-api-br254 0/1 Running 0 22h
portworx-api-tkp4b 0/1 Running 0 22h
portworx-slbsd 1/2 Running 86 22h
px-csi-ext-577876dcb8-hmblh 4/4 Running 0 22h
px-csi-ext-577876dcb8-hzjnp 4/4 Running 0 22h
px-csi-ext-577876dcb8-jfd6m 4/4 Running 0 22h
stork-59dfbd5f89-4w7jq 1/1 Running 0 22h
stork-59dfbd5f89-g4l75 1/1 Running 0 22h
stork-59dfbd5f89-qjct8 1/1 Running 0 22h
stork-scheduler-6c5b979799-45vbq 1/1 Running 0 22h
stork-scheduler-6c5b979799-mr62s 1/1 Running 0 22h
stork-scheduler-6c5b979799-r8bts 1/1 Running 0 22h
How do I start troubleshooting or collecting logs for Portworx ? I can share the output of lsblk or blkid if required.
Describe of one of the portworx-api pods:
[root@localhost ~]# kubectl describe pods/portworx-api-2kvkt -n kube-system
Name: portworx-api-2kvkt
Namespace: kube-system
Priority: 0
Node: workernode2.localdomain/172.23.105.1
Start Time: Thu, 27 Jan 2022 06:46:55 -0800
Labels: controller-revision-hash=db477d449
name=portworx-api
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 172.23.105.1
IPs:
IP: 172.23.105.1
Controlled By: DaemonSet/portworx-api
Containers:
portworx-api:
Container ID: docker://8be2997ab551cde34449d56becb8f2e3211f0887e6501c14db5f81713d3ae564
Image: k8s.gcr.io/pause:3.1
Image ID: docker-pullable://k8s.gcr.io/pause@sha256:f78411e19d84a252e53bff71a4407a5686c46983a2c2eeed83929b888179acea
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 27 Jan 2022 06:47:10 -0800
Ready: False
Restart Count: 0
Readiness: http-get http://127.0.0.1:9001/status delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vkbv9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-vkbv9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m55s (x8101 over 22h) kubelet Readiness probe failed: Get "http://127.0.0.1:9001/status": dial tcp 127.0.0.1:9001: connect: connection refused
FYI: I did label the nodes with px/metadata-node=true. I also provided the kvdb device value as /dev/xvdb
as xvdb is present in my output of lsblk but hitting the same error.
Output from one of the worker nodes attached: journalctl -lu portworx* > node.logs
Looking at the provided Spec file, you are telling Portworx to start and consume all unmounted disks. You also mention labeling nodes and providing a path for the kvdb device.
However, the start command in the log file you attached looks like it is trying to create disks for GKE on GCP. This line is the command telling Portworx what to use:
Jan 27 06:42:26 workernode.localdomain portworx[8857]: time=“2022-01-27T06:42:26-08:00” level=info msg=“PX-RunC arguments: -b -c px-cluster-e0ab5f47-bb74-47cd-acf7-070b0d677ae3 -kvdb_dev type=pd-standard,size=150 -s type=pd-ssd,size=50 -secret_type k8s -x kubernetes”
The type=pd-standard and pd-ssd are GCP volume types.
So it appears the manifest you linked and the logs do not match up.
I would suggest running kubectl delete -f <your-px-specfile.yaml> and then go to Central and generate a new cluster configuration. If /dev/xvdb is the only additional device connected, then do not specify it for use by the KVDB. Instead don’t specify a KVDB device and Portworx will use a portion of that drive for the kvdb and the rest for persistent storage.
It may also be necessary to click the star in the upper right of the Central page and disassociate the cluster ID of this failed deployment if it made it far enough to check in with our license servers to validate PX-Essentials.
Good Luck!
Thanks for the response, looks like I wasn’t disassociating the cluster. When I unlinked my old cluster and generated spec for a fresh one, It worked.
1 Like