Portworx installation fails with 'Failed to start Portworx: failed to setup internal kvdb

Hello,

In order to install Portworx on bare-metal Centos7 machine with kubernetes 1.22.4 installed, I generated a spec file from px-central: https://install.portworx.com/2.8?mc=false&kbver=1.22.4&oem=esse&user=4906ceae-ba0a-11eb-a2c5-c24e499c7467&b=true&c=px-cluster-e6af2090-d640-4398-9664-4c51440c7b22&stork=true&csi=true&tel=false&st=k8s

Before applying this spec file, as suggested I labelled 3 of my worker nodes with px/metadata-node=true.

I then apply the above mentioned spec file.

I check the status of portworx pods in kube-system namespace:

portworx-api-7x8jw                              1/1     Running   0          12m
portworx-api-qxpn9                              0/1     Running   0          12m
portworx-api-rflrk                              0/1     Running   0          12m
portworx-px9nx                                  2/2     Running   0          12m
portworx-rw695                                  1/2     Running   0          12m
portworx-xncgj                                  1/2     Running   0          12m
px-csi-ext-577876dcb8-h2xq2                     4/4     Running   0          12m
px-csi-ext-577876dcb8-pmvz7                     4/4     Running   0          12m
px-csi-ext-577876dcb8-qtrrr                     4/4     Running   0          12m
stork-59dfbd5f89-cm4pc                          1/1     Running   0          12m
stork-59dfbd5f89-q4zb5                          1/1     Running   0          12m
stork-59dfbd5f89-q727q                          1/1     Running   0          12m
stork-scheduler-84bfdfc65d-6xvqn                1/1     Running   0          12m
stork-scheduler-84bfdfc65d-bhm49                1/1     Running   0          12m
stork-scheduler-84bfdfc65d-slcxk                1/1     Running   0          12m

As you can see one of the 3 portworx pods is healthy and running, To confirm I run the pxctl status command on that node hosting the running pod. Output:

Status: PX is operational
Telemetry: Disabled or Unhealthy
License: PX-Essential (lease renewal in 23h, 59m)

with other details.

Now coming to nodes hosting the other 2 portworx pods, the status is:

PX stopped working 5.3s ago.  Last status: Could not init boot manager  (error="failed to setup internal kvdb: failed to create a kvdb connection to peer internal kvdb nodes [[http://<node_ip>:9019]]: dial tcp <node_ip>:9019: connect: no route to host. Make sure peer kvdb nodes are healthy.")

Here, the <node_ip> is the IP of the node running the healthy portworx pod.

The output of blkid from one of the nodes hosting failed portworx pods is:

/dev/xvda1: UUID="d7d30b35-6d80-44fa-9ba2-3b1120c6f54b" TYPE="xfs" 
/dev/xvda2: UUID="qHoJuA-6kC3-OA8h-1G8R-S2yF-izdR-iBEo4O" TYPE="LVM2_member" 
/dev/xvdb: LABEL="pxpool=0,mdpoolid=0,mdvol,n=0,u=e9b2c485-c3d6-4ddc-8180-2759894ccb68,i=5a41050d-6b2c-4251-abd9-5ec6f00f68e0" UUID="058e9229-67e5-4342-be94-87c553d9f57c" UUID_SUB="e0566733-1d57-462d-be74-d511ef86f6ab" TYPE="btrfs" 
/dev/mapper/centos-swap: UUID="b0e8210a-435d-4575-9138-ac09c88b1a96" TYPE="swap" 
/dev/mapper/centos-root: UUID="be95eeb5-a64f-40ea-87d9-9e97fad6201e" TYPE="xfs" 
/dev/loop0: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-pool: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-f7bdff1e8ca4e6e1afaa430781631e2cac0736f75b12971ab33b13d9b89c6a0d: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-7edb96a2c9b886f24948ad239e6f506a775d85fe2641d0b44211c9f0553c4f92: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-934f98344cc1cd38eaf99c296d33f89082436b516e98ba441e8f988fd566480e: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-3141e8dbce99ce354713c33e6c066e4585f5fa1822d66d1d4b2fe66936e81db9: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-dde1a3e6094eb98d6c63b22ef239e1b5624319e6f076f89be8bfff0bd2a82c98: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-50cb93eecb2a02ac54f980186268f8f3b7fbf3a689c560083a9f347262b6685e: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-417d3fdfc502f698d9dd76a896580d4e62485a40e80246e9df47ef5aa8dbb58f: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-6f334e5dfc51f46a5d6dbbd0b970e21374a1e13328fce209a24823ea7787db19: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-39466f23e6625d659302a68db1e4e2e169e233e76cce99114ef10516a6f87242: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-36c320b2299ca5e46c7d6302a3c952edf6e3c663336903bc27f85ec8e769d151: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-3417bf4661e429336da36495215eacd087dadd9969a222ac64ae439071e3859e: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-2d7e913bd8dec7926693c879368f24423ead2a656546c6a177b7a464727a533a: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-d366fe7d14d74bd860ef1b3f7148c506e1bbee38992aade999cd147677c3bf97: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-b05c2217a5eed777df168e6c4a5bf76a7219ccd19770829fdd284e2746b90406: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-4b9187c8903682ecd89ba3f8c35f85c5f440eb23f87f12c677c125e988182b0c: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-28c599fb997ed946107b4762ea6bedf0e5eda727fd6de1453751718ad1e48d1c: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-f1f715dad2586d98926807a706370d99fc0873e540da7b9ba9f0799e317ddb4b: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-67031c1c634f7f3747d99bd33a93f3376bdd79d24b320bcfc426d14b5d05d78f: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 
/dev/mapper/docker-253:1-316486-ebb6d09879cd9737d86cf3eef47f05b1b3dcb34aedeea35d444c79187073b6d4: UUID="4f5a6c41-cce0-402c-bdb1-d1c57b23efe1" TYPE="xfs" 

Output of journalctl -lau portworx-output from one of the nodes present at: px1.log - Google Drive

I’m not really sure what else could I have done differently and your support would be highly appreciated.

Thanks.

Full output:

[root@workernode0 ~]# pxctl status
PX stopped working 17.2s ago.  Last status: Could not init boot manager  (error="failed to setup internal kvdb: failed to create a kvdb connection to peer internal kvdb nodes [[http://172.23.107.35:9019]]: dial tcp 172.23.107.35:9019: connect: no route to host. Make sure peer kvdb nodes are healthy.")

List of last known failures:

Type	ID			Resource				SeveritCount	LastSeen			FirstSeen			Description										
NODE	NodeStartFailure	df6adce4-4599-4995-889a-190146fb01db	ALARM	Feb 22 12:14:08 UTC 2022	Feb 22 12:14:08 UTC 2022	Failed to start Portworx: failed to setup internal kvdb: failed to create a kvdb connection to peer internal kvdb nodes [[http://172.23.107.35:9019]]: dial tcp 172.23.107.35:9019: connect: no route to host. Make sure peer kvdb nodes are healthy.	
NODE	InternalKvdbSetupFailed	df6adce4-4599-4995-889a-190146fb01db	ALARM	Feb 22 12:14:08 UTC 2022	Feb 22 12:14:08 UTC 2022	failed to setup internal kvdb: failed to create a kvdb connection to peer internal kvdb nodes [[http://172.23.107.35:9019]]: dial tcp 172.23.107.35:9019: connect: no route to host. Make sure peer kvdb nodes are healthy.				
NODE	KvdbConnectionFailed	df6adce4-4599-4995-889a-190146fb01db	ALARM	Feb 22 12:14:08 UTC 2022	Feb 22 12:14:08 UTC 2022	Internal Kvdb: failed to create a kvdb connection to peer internal kvdb nodes [[http://172.23.107.35:9019]]: dial tcp 172.23.107.35:9019: connect: no route to host. Make sure peer kvdb nodes are healthy.

Hello Prateek,

Looks like only 1 of the PX nodes was able to start KVDB and thus a kvdb cluster quorum was not form and PX was not able to start.
It could be possible that the storage device used with Portworx was already formatted / had file-system on it and could not be used.

Thanks & Regards
Varun

Hello Varun,

I made sure to unmount one of the disks on all nodes with sufficient storage followed by wiping them, so I don’t think they were formatted/had file-system on them. Having said that I could be wrong, so I’m sharing the output of lsblk which is same across all worker nodes.

[root@workernode2 ~]# lsblk
NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0                       11:0    1 1024M  0 rom  
xvda                      202:0    0   35G  0 disk 
|-xvda1                     202:1    0  500M  0 part /boot
`-xvda2                     202:2    0 34.5G  0 part 
  |-centos-swap               253:0    0  3.5G  0 lvm  
  `-centos-root               253:1    0   31G  0 lvm  /
xvdb                      202:16   0   80G  0 disk 

Here I’m assuming Portworx will look at xvdb and use that to set up internal kvdb which is unmounted. So I am not sure why is Portoworx failing then.

Thanks.