Recover Data from a failed cluster

Hello @dgrekov

Since kubernetes is down we will have to perform a few manual steps to override the configuration that was set when Portworx was installed through Kubernetes. I will walk you through the steps here. Before executing the below steps I would recommend stopping portworx on all the nodes using systemctl stop portworx command.

  1. Extract the command that was used for installing Portworx through kubernetes
python -c 'import json;aa=json.loads(json.load(open("/opt/pwx/oci/config.json"))["annotations"]["PxArgs"]);print "/opt/pwx/bin/px-runc install \"%s\""%"\" \"".join(aa)' 

The above python script will dump the actual command used for installing Portworx that will look something like this

/opt/pwx/bin/px-runc install "-c" "<cluster-id>" "-x" "kubernetes" "-b" ......

  1. Modify the dumped px-runc install command to suit to your current state
  • Remove -x kubernetes from the command - since kubernetes is down and we don’t want to depend on it.
  • Add the "-k" "etcd:http://<etcd-ip>:<etcd-port>" to the command

After the modifications your command will look something like this

/opt/pwx/bin/px-runc install "-c" "<cluster-id>" "-b" "-k" "etcd:http://<etcd-ip>:<etcd-port>" ......

  1. Run the modified px-runc install command on all the nodes

  2. Find the latest .dump file and rename it to pwx_kvdb_disaster_recovery_golden.dump. You need to this only on one node where you find the latest .dump file

  3. Start Portworx service on all nodes - systemctl start portworx

Let us know if you face any issues.

progress, if not 100% but there is more. Details below:
should I remove any env vars?

node3 dgrekov # /opt/pwx/bin/px-runc install -c px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7 -A -f -secret_type k8s -j auto -b -k etcd:http://etcd.dimagre.com:2379 -v /var/lib/kubelet:/var/lib/kubelet:shared -v /dev:/dev -v /opt/pwx/oci/mounts/etc/hosts:/etc/hosts -v /opt/pwx/oci/mounts/etc/resolv.conf:/etc/resolv.conf -v /opt/pwx/oci/mounts/tmp/px-termination-log:/tmp/px-termination-log -v /var/cores:/var/cores -v /var/run/dbus:/var/run/dbus -v /opt/pwx/oci/mounts/var/run/secrets/kubernetes.io/serviceaccount:/var/run/secrets/kubernetes.io/serviceaccount -e ALERTMANAGER_PORTWORX_PORT=tcp://10.3.155.150:9093 -e ALERTMANAGER_PORTWORX_PORT_9093_TCP=tcp://10.3.155.150:9093 -e ALERTMANAGER_PORTWORX_PORT_9093_TCP_ADDR=10.3.155.150 -e ALERTMANAGER_PORTWORX_PORT_9093_TCP_PORT=9093 -e ALERTMANAGER_PORTWORX_PORT_9093_TCP_PROTO=tcp -e ALERTMANAGER_PORTWORX_SERVICE_HOST=10.3.155.150 -e ALERTMANAGER_PORTWORX_SERVICE_PORT=9093 -e ALERTMANAGER_PORTWORX_SERVICE_PORT_WEB=9093 -e AUTOPILOT_PORT=tcp://10.3.166.6:9628 -e AUTOPILOT_PORT_9628_TCP=tcp://10.3.166.6:9628 -e AUTOPILOT_PORT_9628_TCP_ADDR=10.3.166.6 -e AUTOPILOT_PORT_9628_TCP_PORT=9628 -e AUTOPILOT_PORT_9628_TCP_PROTO=tcp -e AUTOPILOT_SERVICE_HOST=10.3.166.6 -e AUTOPILOT_SERVICE_PORT=9628 -e AUTOPILOT_SERVICE_PORT_AUTOPILOT=9628 -e AUTO_NODE_RECOVERY_TIMEOUT_IN_SECS=1500 -e COREDNS_PORT=udp://10.3.0.10:53 -e COREDNS_PORT_53_TCP=tcp://10.3.0.10:53 -e COREDNS_PORT_53_TCP_ADDR=10.3.0.10 -e COREDNS_PORT_53_TCP_PORT=53 -e COREDNS_PORT_53_TCP_PROTO=tcp -e COREDNS_PORT_53_UDP=udp://10.3.0.10:53 -e COREDNS_PORT_53_UDP_ADDR=10.3.0.10 -e COREDNS_PORT_53_UDP_PORT=53 -e COREDNS_PORT_53_UDP_PROTO=udp -e COREDNS_SERVICE_HOST=10.3.0.10 -e COREDNS_SERVICE_PORT=53 -e COREDNS_SERVICE_PORT_DNS=53 -e COREDNS_SERVICE_PORT_DNS_TCP=53 -e CSI_ENDPOINT=unix:///var/lib/kubelet/plugins/pxd.portworx.com/csi.sock -e KUBERNETES_PORT=tcp://10.3.0.1:443 -e KUBERNETES_PORT_443_TCP=tcp://10.3.0.1:443 -e KUBERNETES_PORT_443_TCP_ADDR=10.3.0.1 -e KUBERNETES_PORT_443_TCP_PORT=443 -e KUBERNETES_PORT_443_TCP_PROTO=tcp -e KUBERNETES_SERVICE_HOST=10.3.0.1 -e KUBERNETES_SERVICE_PORT=443 -e KUBERNETES_SERVICE_PORT_HTTPS=443 -e PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin -e PORTWORX_API_PORT=tcp://10.3.224.150:9001 -e PORTWORX_API_PORT_9001_TCP=tcp://10.3.224.150:9001 -e PORTWORX_API_PORT_9001_TCP_ADDR=10.3.224.150 -e PORTWORX_API_PORT_9001_TCP_PORT=9001 -e PORTWORX_API_PORT_9001_TCP_PROTO=tcp -e PORTWORX_API_PORT_9020_TCP=tcp://10.3.224.150:9020 -e PORTWORX_API_PORT_9020_TCP_ADDR=10.3.224.150 -e PORTWORX_API_PORT_9020_TCP_PORT=9020 -e PORTWORX_API_PORT_9020_TCP_PROTO=tcp -e PORTWORX_API_PORT_9021_TCP=tcp://10.3.224.150:9021 -e PORTWORX_API_PORT_9021_TCP_ADDR=10.3.224.150 -e PORTWORX_API_PORT_9021_TCP_PORT=9021 -e PORTWORX_API_PORT_9021_TCP_PROTO=tcp -e PORTWORX_API_SERVICE_HOST=10.3.224.150 -e PORTWORX_API_SERVICE_PORT=9001 -e PORTWORX_API_SERVICE_PORT_PX_API=9001 -e PORTWORX_API_SERVICE_PORT_PX_REST_GATEWAY=9021 -e PORTWORX_API_SERVICE_PORT_PX_SDK=9020 -e PORTWORX_SERVICE_PORT=tcp://10.3.138.68:9001 -e PORTWORX_SERVICE_PORT_9001_TCP=tcp://10.3.138.68:9001 -e PORTWORX_SERVICE_PORT_9001_TCP_ADDR=10.3.138.68 -e PORTWORX_SERVICE_PORT_9001_TCP_PORT=9001 -e PORTWORX_SERVICE_PORT_9001_TCP_PROTO=tcp -e PORTWORX_SERVICE_PORT_9019_TCP=tcp://10.3.138.68:9019 -e PORTWORX_SERVICE_PORT_9019_TCP_ADDR=10.3.138.68 -e PORTWORX_SERVICE_PORT_9019_TCP_PORT=9019 -e PORTWORX_SERVICE_PORT_9019_TCP_PROTO=tcp -e PORTWORX_SERVICE_PORT_9020_TCP=tcp://10.3.138.68:9020 -e PORTWORX_SERVICE_PORT_9020_TCP_ADDR=10.3.138.68 -e PORTWORX_SERVICE_PORT_9020_TCP_PORT=9020 -e PORTWORX_SERVICE_PORT_9020_TCP_PROTO=tcp -e PORTWORX_SERVICE_PORT_9021_TCP=tcp://10.3.138.68:9021 -e PORTWORX_SERVICE_PORT_9021_TCP_ADDR=10.3.138.68 -e PORTWORX_SERVICE_PORT_9021_TCP_PORT=9021 -e PORTWORX_SERVICE_PORT_9021_TCP_PROTO=tcp -e PORTWORX_SERVICE_SERVICE_HOST=10.3.138.68 -e PORTWORX_SERVICE_SERVICE_PORT=9001 -e PORTWORX_SERVICE_SERVICE_PORT_PX_API=9001 -e PORTWORX_SERVICE_SERVICE_PORT_PX_KVDB=9019 -e PORTWORX_SERVICE_SERVICE_PORT_PX_REST_GATEWAY=9021 -e PORTWORX_SERVICE_SERVICE_PORT_PX_SDK=9020 -e PROMETHEUS_PORT=tcp://10.3.34.111:9090 -e PROMETHEUS_PORT_9090_TCP=tcp://10.3.34.111:9090 -e PROMETHEUS_PORT_9090_TCP_ADDR=10.3.34.111 -e PROMETHEUS_PORT_9090_TCP_PORT=9090 -e PROMETHEUS_PORT_9090_TCP_PROTO=tcp -e PROMETHEUS_SERVICE_HOST=10.3.34.111 -e PROMETHEUS_SERVICE_PORT=9090 -e PROMETHEUS_SERVICE_PORT_WEB=9090 -e PX_LIGHTHOUSE_PORT=tcp://10.3.29.146:80 -e PX_LIGHTHOUSE_PORT_443_TCP=tcp://10.3.29.146:443 -e PX_LIGHTHOUSE_PORT_443_TCP_ADDR=10.3.29.146 -e PX_LIGHTHOUSE_PORT_443_TCP_PORT=443 -e PX_LIGHTHOUSE_PORT_443_TCP_PROTO=tcp -e PX_LIGHTHOUSE_PORT_80_TCP=tcp://10.3.29.146:80 -e PX_LIGHTHOUSE_PORT_80_TCP_ADDR=10.3.29.146 -e PX_LIGHTHOUSE_PORT_80_TCP_PORT=80 -e PX_LIGHTHOUSE_PORT_80_TCP_PROTO=tcp -e PX_LIGHTHOUSE_SERVICE_HOST=10.3.29.146 -e PX_LIGHTHOUSE_SERVICE_PORT=80 -e PX_LIGHTHOUSE_SERVICE_PORT_HTTP=80 -e PX_LIGHTHOUSE_SERVICE_PORT_HTTPS=443 -e PX_TEMPLATE_VERSION=v4 -e SEALED_SECRETS_CONTROLLER_PORT=tcp://10.3.127.66:8080 -e SEALED_SECRETS_CONTROLLER_PORT_8080_TCP=tcp://10.3.127.66:8080 -e SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_ADDR=10.3.127.66 -e SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_PORT=8080 -e SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_PROTO=tcp -e SEALED_SECRETS_CONTROLLER_SERVICE_HOST=10.3.127.66 -e SEALED_SECRETS_CONTROLLER_SERVICE_PORT=8080 -e STORK_SERVICE_PORT=tcp://10.3.4.152:8099 -e STORK_SERVICE_PORT_443_TCP=tcp://10.3.4.152:443 -e STORK_SERVICE_PORT_443_TCP_ADDR=10.3.4.152 -e STORK_SERVICE_PORT_443_TCP_PORT=443 -e STORK_SERVICE_PORT_443_TCP_PROTO=tcp -e STORK_SERVICE_PORT_8099_TCP=tcp://10.3.4.152:8099 -e STORK_SERVICE_PORT_8099_TCP_ADDR=10.3.4.152 -e STORK_SERVICE_PORT_8099_TCP_PORT=8099 -e STORK_SERVICE_PORT_8099_TCP_PROTO=tcp -e STORK_SERVICE_SERVICE_HOST=10.3.4.152 -e STORK_SERVICE_SERVICE_PORT=8099 -e STORK_SERVICE_SERVICE_PORT_EXTENDER=8099 -e STORK_SERVICE_SERVICE_PORT_WEBHOOK=443 -e container=oci -e PX_IMAGE=portworx/px-essentials:2.6.1.4 -e CONTAINER_RUNTIME=docker -e PX_IMAGE_DIGEST=sha256:55d0411bfd033b4a98356b90efbc8ba4a0ca97418374b551d7d8d36b31888d50 -e KUBELET_DIR=/var/lib/kubelet
INFO[0000] Rootfs found at /opt/pwx/oci/rootfs          
INFO[0000] PX binaries found at /opt/pwx/bin/px-runc    
INFO[0000] Initializing as version 2.6.1.4-775a586 (OCI) 
INFO[0000] Enabling Sharedv4 NFS support ...            
INFO[0000] Setting up NFS service                       
INFO[0000] > Initialized service controls via DBus{type:dbus,svc:nfs-server.service,id:0xc420516260} 
INFO[0000] Fixing docker.sock mount:                    
INFO[0000] > Removing mount for /var/run/docker.sock:/var/run/docker.sock:[rbind rprivate] 
INFO[0000] > Adding mount for /run:/var/host_run:[bind rprivate] 
INFO[0000] > Soft-link /opt/pwx/oci/rootfs/run/docker.sock -> /var/host_run/docker.sock already exists 
INFO[0000] Checking mountpoints for following shared directories: [/var/lib/kubelet /var/lib/osd] 
INFO[0000] Found following mountpoints for shared dirs: map[/var/lib/kubelet:{isMP=f,Opts=shared:1,Parent=/} /:{isMP=T,Opts=shared:1} /var/lib/osd:{isMP=f,Opts=shared:1,Parent=/}] 
INFO[0000] SPEC UNCHANGED [8ad01aa4095b1424836bcb8f54d0d757 /opt/pwx/oci/config.json] 
WARN[0000] Could not link /opt/pwx/bin/pxctl to /usr/local/bin  error="symlink pxctl to /usr/local/bin/pxctl failed: symlink /opt/pwx/bin/pxctl /usr/local/bin/pxctl: read-only file system"
WARN[0000] Could not link /opt/pwx/bin/pxctl to /usr/bin  error="symlink pxctl to /usr/bin/pxctl failed: symlink /opt/pwx/bin/pxctl /usr/bin/pxctl: read-only file system"
INFO[0000] PX-RunC arguments: -A -b -c px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7 -f -j auto -k etcd:http://etcd.dimagre.com:2379 -secret_type k8s 
INFO[0000] PX-RunC mounts: /dev:/dev /etc/exports:/etc/exports /opt/pwx/oci/mounts/etc/hosts:/etc/hosts /etc/iscsi:/etc/iscsi /etc/mdadm:/etc/mdadm /etc/nvme:/etc/nvme /etc/nvmet:/etc/nvmet /etc/pwx:/etc/pwx /opt/pwx/oci/mounts/etc/resolv.conf:/etc/resolv.conf /etc/target:/etc/target /opt/pwx/bin:/export_bin /proc:/hostproc /lib/modules:/lib/modules proc:/proc:nosuid,noexec,nodev /run/docker:/run/docker /run/lock/iscsi:/run/lock/iscsi /run/log/journal:/run/log/journal:ro /run/lvm:/run/lvm /run/mdadm:/run/mdadm /run/udev:/run/udev sysfs:/sys:nosuid,noexec,nodev cgroup:/sys/fs/cgroup:nosuid,noexec,nodev /opt/pwx/oci/mounts/tmp/px-termination-log:/tmp/px-termination-log /var/cores:/var/cores /run:/var/host_run:bind /var/lib/iscsi:/var/lib/iscsi /var/lib/kubelet:/var/lib/kubelet:shared /var/lib/nfs:/var/lib/nfs /var/lib/osd:/var/lib/osd:shared /var/lock/iscsi:/var/lock/iscsi /var/log/journal:/var/log/journal:ro /var/run/dbus:/var/run/dbus /opt/pwx/oci/mounts/var/run/secrets/kubernetes.io/serviceaccount:/var/run/secrets/kubernetes.io/serviceaccount 
INFO[0000] PX-RunC env: ALERTMANAGER_PORTWORX_PORT=tcp://10.3.155.150:9093 ALERTMANAGER_PORTWORX_PORT_9093_TCP=tcp://10.3.155.150:9093 ALERTMANAGER_PORTWORX_PORT_9093_TCP_ADDR=10.3.155.150 ALERTMANAGER_PORTWORX_PORT_9093_TCP_PORT=9093 ALERTMANAGER_PORTWORX_PORT_9093_TCP_PROTO=tcp ALERTMANAGER_PORTWORX_SERVICE_HOST=10.3.155.150 ALERTMANAGER_PORTWORX_SERVICE_PORT=9093 ALERTMANAGER_PORTWORX_SERVICE_PORT_WEB=9093 AUTOPILOT_PORT=tcp://10.3.166.6:9628 AUTOPILOT_PORT_9628_TCP=tcp://10.3.166.6:9628 AUTOPILOT_PORT_9628_TCP_ADDR=10.3.166.6 AUTOPILOT_PORT_9628_TCP_PORT=9628 AUTOPILOT_PORT_9628_TCP_PROTO=tcp AUTOPILOT_SERVICE_HOST=10.3.166.6 AUTOPILOT_SERVICE_PORT=9628 AUTOPILOT_SERVICE_PORT_AUTOPILOT=9628 AUTO_NODE_RECOVERY_TIMEOUT_IN_SECS=1500 CONTAINER_RUNTIME=docker COREDNS_PORT=udp://10.3.0.10:53 COREDNS_PORT_53_TCP=tcp://10.3.0.10:53 COREDNS_PORT_53_TCP_ADDR=10.3.0.10 COREDNS_PORT_53_TCP_PORT=53 COREDNS_PORT_53_TCP_PROTO=tcp COREDNS_PORT_53_UDP=udp://10.3.0.10:53 COREDNS_PORT_53_UDP_ADDR=10.3.0.10 COREDNS_PORT_53_UDP_PORT=53 COREDNS_PORT_53_UDP_PROTO=udp COREDNS_SERVICE_HOST=10.3.0.10 COREDNS_SERVICE_PORT=53 COREDNS_SERVICE_PORT_DNS=53 COREDNS_SERVICE_PORT_DNS_TCP=53 CSI_ENDPOINT=unix:///var/lib/kubelet/plugins/pxd.portworx.com/csi.sock GOMAXPROCS=64 GOTRACEBACK=crash KUBELET_DIR=/var/lib/kubelet KUBERNETES_PORT=tcp://10.3.0.1:443 KUBERNETES_PORT_443_TCP=tcp://10.3.0.1:443 KUBERNETES_PORT_443_TCP_ADDR=10.3.0.1 KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_SERVICE_HOST=10.3.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443 LVM_USE_HOST=1 NFS_SERVICE=nfs-server.service PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PORTWORX_API_PORT=tcp://10.3.224.150:9001 PORTWORX_API_PORT_9001_TCP=tcp://10.3.224.150:9001 PORTWORX_API_PORT_9001_TCP_ADDR=10.3.224.150 PORTWORX_API_PORT_9001_TCP_PORT=9001 PORTWORX_API_PORT_9001_TCP_PROTO=tcp PORTWORX_API_PORT_9020_TCP=tcp://10.3.224.150:9020 PORTWORX_API_PORT_9020_TCP_ADDR=10.3.224.150 PORTWORX_API_PORT_9020_TCP_PORT=9020 PORTWORX_API_PORT_9020_TCP_PROTO=tcp PORTWORX_API_PORT_9021_TCP=tcp://10.3.224.150:9021 PORTWORX_API_PORT_9021_TCP_ADDR=10.3.224.150 PORTWORX_API_PORT_9021_TCP_PORT=9021 PORTWORX_API_PORT_9021_TCP_PROTO=tcp PORTWORX_API_SERVICE_HOST=10.3.224.150 PORTWORX_API_SERVICE_PORT=9001 PORTWORX_API_SERVICE_PORT_PX_API=9001 PORTWORX_API_SERVICE_PORT_PX_REST_GATEWAY=9021 PORTWORX_API_SERVICE_PORT_PX_SDK=9020 PORTWORX_SERVICE_PORT=tcp://10.3.138.68:9001 PORTWORX_SERVICE_PORT_9001_TCP=tcp://10.3.138.68:9001 PORTWORX_SERVICE_PORT_9001_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9001_TCP_PORT=9001 PORTWORX_SERVICE_PORT_9001_TCP_PROTO=tcp PORTWORX_SERVICE_PORT_9019_TCP=tcp://10.3.138.68:9019 PORTWORX_SERVICE_PORT_9019_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9019_TCP_PORT=9019 PORTWORX_SERVICE_PORT_9019_TCP_PROTO=tcp PORTWORX_SERVICE_PORT_9020_TCP=tcp://10.3.138.68:9020 PORTWORX_SERVICE_PORT_9020_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9020_TCP_PORT=9020 PORTWORX_SERVICE_PORT_9020_TCP_PROTO=tcp PORTWORX_SERVICE_PORT_9021_TCP=tcp://10.3.138.68:9021 PORTWORX_SERVICE_PORT_9021_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9021_TCP_PORT=9021 PORTWORX_SERVICE_PORT_9021_TCP_PROTO=tcp PORTWORX_SERVICE_SERVICE_HOST=10.3.138.68 PORTWORX_SERVICE_SERVICE_PORT=9001 PORTWORX_SERVICE_SERVICE_PORT_PX_API=9001 PORTWORX_SERVICE_SERVICE_PORT_PX_KVDB=9019 PORTWORX_SERVICE_SERVICE_PORT_PX_REST_GATEWAY=9021 PORTWORX_SERVICE_SERVICE_PORT_PX_SDK=9020 PROMETHEUS_PORT=tcp://10.3.34.111:9090 PROMETHEUS_PORT_9090_TCP=tcp://10.3.34.111:9090 PROMETHEUS_PORT_9090_TCP_ADDR=10.3.34.111 PROMETHEUS_PORT_9090_TCP_PORT=9090 PROMETHEUS_PORT_9090_TCP_PROTO=tcp PROMETHEUS_SERVICE_HOST=10.3.34.111 PROMETHEUS_SERVICE_PORT=9090 PROMETHEUS_SERVICE_PORT_WEB=9090 PX_IMAGE=portworx/px-essentials:2.6.1.4 PX_IMAGE_DIGEST=sha256:55d0411bfd033b4a98356b90efbc8ba4a0ca97418374b551d7d8d36b31888d50 PX_LIGHTHOUSE_PORT=tcp://10.3.29.146:80 PX_LIGHTHOUSE_PORT_443_TCP=tcp://10.3.29.146:443 PX_LIGHTHOUSE_PORT_443_TCP_ADDR=10.3.29.146 PX_LIGHTHOUSE_PORT_443_TCP_PORT=443 PX_LIGHTHOUSE_PORT_443_TCP_PROTO=tcp PX_LIGHTHOUSE_PORT_80_TCP=tcp://10.3.29.146:80 PX_LIGHTHOUSE_PORT_80_TCP_ADDR=10.3.29.146 PX_LIGHTHOUSE_PORT_80_TCP_PORT=80 PX_LIGHTHOUSE_PORT_80_TCP_PROTO=tcp PX_LIGHTHOUSE_SERVICE_HOST=10.3.29.146 PX_LIGHTHOUSE_SERVICE_PORT=80 PX_LIGHTHOUSE_SERVICE_PORT_HTTP=80 PX_LIGHTHOUSE_SERVICE_PORT_HTTPS=443 PX_LOGLEVEL=info PX_RUNC=true PX_SHARED=/var/lib/kubelet:shared:1;/var/lib/osd:shared:1 PX_TEMPLATE_VERSION=v4 PX_VERSION=2.6.1.4-775a586 SEALED_SECRETS_CONTROLLER_PORT=tcp://10.3.127.66:8080 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP=tcp://10.3.127.66:8080 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_ADDR=10.3.127.66 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_PORT=8080 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_PROTO=tcp SEALED_SECRETS_CONTROLLER_SERVICE_HOST=10.3.127.66 SEALED_SECRETS_CONTROLLER_SERVICE_PORT=8080 STORK_SERVICE_PORT=tcp://10.3.4.152:8099 STORK_SERVICE_PORT_443_TCP=tcp://10.3.4.152:443 STORK_SERVICE_PORT_443_TCP_ADDR=10.3.4.152 STORK_SERVICE_PORT_443_TCP_PORT=443 STORK_SERVICE_PORT_443_TCP_PROTO=tcp STORK_SERVICE_PORT_8099_TCP=tcp://10.3.4.152:8099 STORK_SERVICE_PORT_8099_TCP_ADDR=10.3.4.152 STORK_SERVICE_PORT_8099_TCP_PORT=8099 STORK_SERVICE_PORT_8099_TCP_PROTO=tcp STORK_SERVICE_SERVICE_HOST=10.3.4.152 STORK_SERVICE_SERVICE_PORT=8099 STORK_SERVICE_SERVICE_PORT_EXTENDER=8099 STORK_SERVICE_SERVICE_PORT_WEBHOOK=443 TERM=xterm container=oci 
INFO[0000] /etc/systemd/system/portworx-reboot.service content unchanged [1dc97b965f3c6ad99aa3a92a02b2e8b1 /etc/systemd/system/portworx-reboot.service] 
INFO[0000] /etc/systemd/system/portworx.socket content unchanged [e80e04204b7a7d113db36c53f420d635 /etc/systemd/system/portworx.socket] 
INFO[0000] /etc/systemd/system/portworx-output.service content unchanged [7340d8be39a32f3b7d296ac8275bc2e1 /etc/systemd/system/portworx-output.service] 
INFO[0000] /etc/systemd/system/portworx.service content unchanged [4c6515ccf9c8cb3f81745795a981723c /etc/systemd/system/portworx.service] 
node3 dgrekov # /opt/pwx/bin/pxctl status
PX stopped working 1h18m2.5s ago.  Last status: Could not init boot manager  (error="Failed to initialize k8s bootstrap: Failed to create configmap px-bootstrap-pxcluster19235250e7384df6b6e3be7a12a535b7: Post https://10.3.0.1:443/api/v1/namespaces/kube-system/configmaps: dial tcp 10.3.0.1:443: connect: connection refused")
node3 dgrekov #

Those env variables should be fine fow now.
The install command is just going to setup the configurations files for Portworx. To start Portworx, you need to run systemctl start portworx.

Also to perform the kvdb recovery, you will need to choose a dump file and then start Portworx as mentioned in my previous post.

That make sense, not sure why I missed that step. thank you. I’ve started up the service on one node… let me know if I should run the install command or startup on other nodes.

journalctl show Portworx unit running and not crashing

this is what I get now:

**node3** **dgrekov #** /opt/pwx/bin/pxctl status
PX is not running on host: Could not reach 'HealthMonitor'

Can you share the logs from node-3 ? Also just confirming that you found the latest dump on node3 and renamed it to pwx_kvdb_disaster_recovery_golden.dump

You will need to run the px-runc install command and systemctl start portworx command on all nodes.

done and done, same effect so far, and confirmed on the dump file.

**dgrekov@node3** **~ $** sudo journalctl -f
-- Logs begin at Tue 2021-01-05 16:41:37 UTC. --
Jan 12 22:59:35 node3.dimagre.com systemd[1]: portworx.service: Succeeded.
Jan 12 22:59:35 node3.dimagre.com systemd[1]: Stopped Portworx OCI Container.
Jan 12 22:59:35 node3.dimagre.com systemd[1]: Stopping Portworx FIFO logging reader...
Jan 12 22:59:35 node3.dimagre.com systemd[1]: portworx-output.service: Succeeded.
Jan 12 22:59:35 node3.dimagre.com systemd[1]: Stopped Portworx FIFO logging reader.
Jan 12 22:59:35 node3.dimagre.com systemd[1]: portworx.socket: Succeeded.
Jan 12 22:59:35 node3.dimagre.com systemd[1]: Closed Portworx logging FIFO.
Jan 12 22:59:42 node3.dimagre.com sudo[2909036]: pam_unix(sudo:session): session closed for user root
Jan 12 22:59:48 node3.dimagre.com sudo[2910820]: **dgrekov : TTY=pts/1 ; PWD=/home/dgrekov ; USER=root ; COMMAND=/bin/journalctl -f**
Jan 12 22:59:48 node3.dimagre.com sudo[2910820]: pam_unix(sudo:session): session opened for user root by dgrekov(uid=0)
Jan 12 22:59:58 node3.dimagre.com systemd[1]: Listening on Portworx logging FIFO.
Jan 12 22:59:58 node3.dimagre.com systemd[1]: Started Portworx FIFO logging reader.
Jan 12 22:59:58 node3.dimagre.com systemd[1]: Starting Portworx OCI Container...
Jan 12 22:59:58 node3.dimagre.com systemd[1]: Started Portworx OCI Container.
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Rootfs found at /opt/pwx/oci/rootfs"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="PX binaries found at /opt/pwx/bin/px-runc"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Multipath conf update disabled. PX devices will not be blacklisted."
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Initializing as version 2.6.1.4-775a586 (OCI)"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="SPEC READ [8ad01aa4095b1424836bcb8f54d0d757 /opt/pwx/oci/config.json]"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Enabling Sharedv4 NFS support ..."
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Setting up NFS service"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="> Initialized service controls via DBus{type:dbus,svc:nfs-server.service,id:0xc420292b20}"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Checking mountpoints for following shared directories: [/var/lib/kubelet /var/lib/osd]"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Found following mountpoints for shared dirs: map[/var/lib/kubelet:{isMP=f,Opts=shared:1,Parent=/} /:{isMP=T,Opts=shared:1} /var/lib/osd:{isMP=f,Opts=shared:1,Parent=/}]"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="PX-RunC arguments: -A -b -c px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7 -f -j auto -k etcd:http://etcd.dimagre.com:2379 -secret_type k8s"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="PX-RunC mounts: /dev:/dev /etc/exports:/etc/exports /opt/pwx/oci/mounts/etc/hosts:/etc/hosts /etc/iscsi:/etc/iscsi /etc/mdadm:/etc/mdadm /etc/nvme:/etc/nvme /etc/nvmet:/etc/nvmet /etc/pwx:/etc/pwx /opt/pwx/oci/mounts/etc/resolv.conf:/etc/resolv.conf /etc/target:/etc/target /opt/pwx/bin:/export_bin /proc:/hostproc /lib/modules:/lib/modules proc:/proc:nosuid,noexec,nodev /run/docker:/run/docker /run/lock/iscsi:/run/lock/iscsi /run/log/journal:/run/log/journal:ro /run/lvm:/run/lvm /run/mdadm:/run/mdadm /run/udev:/run/udev sysfs:/sys:nosuid,noexec,nodev cgroup:/sys/fs/cgroup:nosuid,noexec,nodev /opt/pwx/oci/mounts/tmp/px-termination-log:/tmp/px-termination-log /var/cores:/var/cores /run:/var/host_run:bind /var/lib/iscsi:/var/lib/iscsi /var/lib/kubelet:/var/lib/kubelet:shared /var/lib/nfs:/var/lib/nfs /var/lib/osd:/var/lib/osd:shared /var/lock/iscsi:/var/lock/iscsi /var/log/journal:/var/log/journal:ro /var/run/dbus:/var/run/dbus /opt/pwx/oci/mounts/var/run/secrets/kubernetes.io/serviceaccount:/var/run/secrets/kubernetes.io/serviceaccount"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="PX-RunC env: ALERTMANAGER_PORTWORX_PORT=tcp://10.3.155.150:9093 ALERTMANAGER_PORTWORX_PORT_9093_TCP=tcp://10.3.155.150:9093 ALERTMANAGER_PORTWORX_PORT_9093_TCP_ADDR=10.3.155.150 ALERTMANAGER_PORTWORX_PORT_9093_TCP_PORT=9093 ALERTMANAGER_PORTWORX_PORT_9093_TCP_PROTO=tcp ALERTMANAGER_PORTWORX_SERVICE_HOST=10.3.155.150 ALERTMANAGER_PORTWORX_SERVICE_PORT=9093 ALERTMANAGER_PORTWORX_SERVICE_PORT_WEB=9093 AUTOPILOT_PORT=tcp://10.3.166.6:9628 AUTOPILOT_PORT_9628_TCP=tcp://10.3.166.6:9628 AUTOPILOT_PORT_9628_TCP_ADDR=10.3.166.6 AUTOPILOT_PORT_9628_TCP_PORT=9628 AUTOPILOT_PORT_9628_TCP_PROTO=tcp AUTOPILOT_SERVICE_HOST=10.3.166.6 AUTOPILOT_SERVICE_PORT=9628 AUTOPILOT_SERVICE_PORT_AUTOPILOT=9628 AUTO_NODE_RECOVERY_TIMEOUT_IN_SECS=1500 CONTAINER_RUNTIME=docker COREDNS_PORT=udp://10.3.0.10:53 COREDNS_PORT_53_TCP=tcp://10.3.0.10:53 COREDNS_PORT_53_TCP_ADDR=10.3.0.10 COREDNS_PORT_53_TCP_PORT=53 COREDNS_PORT_53_TCP_PROTO=tcp COREDNS_PORT_53_UDP=udp://10.3.0.10:53 COREDNS_PORT_53_UDP_ADDR=10.3.0.10 COREDNS_PORT_53_UDP_PORT=53 COREDNS_PORT_53_UDP_PROTO=udp COREDNS_SERVICE_HOST=10.3.0.10 COREDNS_SERVICE_PORT=53 COREDNS_SERVICE_PORT_DNS=53 COREDNS_SERVICE_PORT_DNS_TCP=53 CSI_ENDPOINT=unix:///var/lib/kubelet/plugins/pxd.portworx.com/csi.sock GOMAXPROCS=64 GOTRACEBACK=crash KUBELET_DIR=/var/lib/kubelet KUBERNETES_PORT=tcp://10.3.0.1:443 KUBERNETES_PORT_443_TCP=tcp://10.3.0.1:443 KUBERNETES_PORT_443_TCP_ADDR=10.3.0.1 KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_SERVICE_HOST=10.3.0.1 KUBERNETES_SERVICE_PORT=443 KUBERNETES_SERVICE_PORT_HTTPS=443 LVM_USE_HOST=1 NFS_SERVICE=nfs-server.service PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PORTWORX_API_PORT=tcp://10.3.224.150:9001 PORTWORX_API_PORT_9001_TCP=tcp://10.3.224.150:9001 PORTWORX_API_PORT_9001_TCP_ADDR=10.3.224.150 PORTWORX_API_PORT_9001_TCP_PORT=9001 PORTWORX_API_PORT_9001_TCP_PROTO=tcp PORTWORX_API_PORT_9020_TCP=tcp://10.3.224.150:9020 PORTWORX_API_PORT_9020_TCP_ADDR=10.3.224.150 PORTWORX_API_PORT_9020_TCP_PORT=9020 PORTWORX_API_PORT_9020_TCP_PROTO=tcp PORTWORX_API_PORT_9021_TCP=tcp://10.3.224.150:9021 PORTWORX_API_PORT_9021_TCP_ADDR=10.3.224.150 PORTWORX_API_PORT_9021_TCP_PORT=9021 PORTWORX_API_PORT_9021_TCP_PROTO=tcp PORTWORX_API_SERVICE_HOST=10.3.224.150 PORTWORX_API_SERVICE_PORT=9001 PORTWORX_API_SERVICE_PORT_PX_API=9001 PORTWORX_API_SERVICE_PORT_PX_REST_GATEWAY=9021 PORTWORX_API_SERVICE_PORT_PX_SDK=9020 PORTWORX_SERVICE_PORT=tcp://10.3.138.68:9001 PORTWORX_SERVICE_PORT_9001_TCP=tcp://10.3.138.68:9001 PORTWORX_SERVICE_PORT_9001_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9001_TCP_PORT=9001 PORTWORX_SERVICE_PORT_9001_TCP_PROTO=tcp PORTWORX_SERVICE_PORT_9019_TCP=tcp://10.3.138.68:9019 PORTWORX_SERVICE_PORT_9019_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9019_TCP_PORT=9019 PORTWORX_SERVICE_PORT_9019_TCP_PROTO=tcp PORTWORX_SERVICE_PORT_9020_TCP=tcp://10.3.138.68:9020 PORTWORX_SERVICE_PORT_9020_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9020_TCP_PORT=9020 PORTWORX_SERVICE_PORT_9020_TCP_PROTO=tcp PORTWORX_SERVICE_PORT_9021_TCP=tcp://10.3.138.68:9021 PORTWORX_SERVICE_PORT_9021_TCP_ADDR=10.3.138.68 PORTWORX_SERVICE_PORT_9021_TCP_PORT=9021 PORTWORX_SERVICE_PORT_9021_TCP_PROTO=tcp PORTWORX_SERVICE_SERVICE_HOST=10.3.138.68 PORTWORX_SERVICE_SERVICE_PORT=9001 PORTWORX_SERVICE_SERVICE_PORT_PX_API=9001 PORTWORX_SERVICE_SERVICE_PORT_PX_KVDB=9019 PORTWORX_SERVICE_SERVICE_PORT_PX_REST_GATEWAY=9021 PORTWORX_SERVICE_SERVICE_PORT_PX_SDK=9020 PROMETHEUS_PORT=tcp://10.3.34.111:9090 PROMETHEUS_PORT_9090_TCP=tcp://10.3.34.111:9090 PROMETHEUS_PORT_9090_TCP_ADDR=10.3.34.111 PROMETHEUS_PORT_9090_TCP_PORT=9090 PROMETHEUS_PORT_9090_TCP_PROTO=tcp PROMETHEUS_SERVICE_HOST=10.3.34.111 PROMETHEUS_SERVICE_PORT=9090 PROMETHEUS_SERVICE_PORT_WEB=9090 PX_IMAGE=portworx/px-essentials:2.6.1.4 PX_IMAGE_DIGEST=sha256:55d0411bfd033b4a98356b90efbc8ba4a0ca97418374b551d7d8d36b31888d50 PX_LIGHTHOUSE_PORT=tcp://10.3.29.146:80 PX_LIGHTHOUSE_PORT_443_TCP=tcp://10.3.29.146:443 PX_LIGHTHOUSE_PORT_443_TCP_ADDR=10.3.29.146 PX_LIGHTHOUSE_PORT_443_TCP_PORT=443 PX_LIGHTHOUSE_PORT_443_TCP_PROTO=tcp PX_LIGHTHOUSE_PORT_80_TCP=tcp://10.3.29.146:80 PX_LIGHTHOUSE_PORT_80_TCP_ADDR=10.3.29.146 PX_LIGHTHOUSE_PORT_80_TCP_PORT=80 PX_LIGHTHOUSE_PORT_80_TCP_PROTO=tcp PX_LIGHTHOUSE_SERVICE_HOST=10.3.29.146 PX_LIGHTHOUSE_SERVICE_PORT=80 PX_LIGHTHOUSE_SERVICE_PORT_HTTP=80 PX_LIGHTHOUSE_SERVICE_PORT_HTTPS=443 PX_LOGLEVEL=info PX_RUNC=true PX_SHARED=/var/lib/kubelet:shared:1;/var/lib/osd:shared:1 PX_TEMPLATE_VERSION=v4 PX_VERSION=2.6.1.4-775a586 SEALED_SECRETS_CONTROLLER_PORT=tcp://10.3.127.66:8080 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP=tcp://10.3.127.66:8080 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_ADDR=10.3.127.66 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_PORT=8080 SEALED_SECRETS_CONTROLLER_PORT_8080_TCP_PROTO=tcp SEALED_SECRETS_CONTROLLER_SERVICE_HOST=10.3.127.66 SEALED_SECRETS_CONTROLLER_SERVICE_PORT=8080 STORK_SERVICE_PORT=tcp://10.3.4.152:8099 STORK_SERVICE_PORT_443_TCP=tcp://10.3.4.152:443 STORK_SERVICE_PORT_443_TCP_ADDR=10.3.4.152 STORK_SERVICE_PORT_443_TCP_PORT=443 STORK_SERVICE_PORT_443_TCP_PROTO=tcp STORK_SERVICE_PORT_8099_TCP=tcp://10.3.4.152:8099 STORK_SERVICE_PORT_8099_TCP_ADDR=10.3.4.152 STORK_SERVICE_PORT_8099_TCP_PORT=8099 STORK_SERVICE_PORT_8099_TCP_PROTO=tcp STORK_SERVICE_SERVICE_HOST=10.3.4.152 STORK_SERVICE_SERVICE_PORT=8099 STORK_SERVICE_SERVICE_PORT_EXTENDER=8099 STORK_SERVICE_SERVICE_PORT_WEBHOOK=443 TERM=xterm container=oci"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Parent OCI mount '/opt/pwx/oci' already on PRIVATE propagation (private)"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Found 2 usable runc binaries: /opt/pwx/bin/runc and /opt/pwx/bin/runc-fb"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Detected kernel release 5.4.83"
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: time="2021-01-12T22:59:58Z" level=info msg="Exec: [\"/opt/pwx/bin/runc\" \"run\" \"-b\" \"/opt/pwx/oci\" \"--no-new-keyring\" \"portworx\"]"
Jan 12 22:59:58 node3.dimagre.com systemd[1]: **Failed to generate valid unit name from path '/var/lib/rkt/pods/exited-garbage/6057d8fc-6d4f-4278-a0e9-86ba10329d8a/stage1/rootfs/opt/stage2/kubelet/rootfs/var/lib/kubelet/pods/81b62b50-51da-4e0c-be3f-c004832ef698/volumes/kubernetes.io~secret/kube-proxy-token-nlq5c', ignoring mount point: Invalid argument**
Jan 12 22:59:58 node3.dimagre.com systemd[2909012]: **Failed to generate valid unit name from path '/var/lib/rkt/pods/exited-garbage/6057d8fc-6d4f-4278-a0e9-86ba10329d8a/stage1/rootfs/opt/stage2/kubelet/rootfs/var/lib/kubelet/pods/81b62b50-51da-4e0c-be3f-c004832ef698/volumes/kubernetes.io~secret/kube-proxy-token-nlq5c', ignoring mount point: Invalid argument**
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: Executing with arguments: -A -b -c px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7 -f -j auto -k etcd:http://etcd.dimagre.com:2379 -secret_type k8s
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: Installed pxctl...
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: Tue Jan 12 22:59:58 UTC 2021 : Running version 2.6.1.4-775a586 on Linux node3.dimagre.com 5.4.83-flatcar #1 SMP Tue Dec 15 18:31:34 -00 2020 x86_64 x86_64 x86_64 GNU/Linux
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: Version: Linux version 5.4.83-flatcar (build@pony-truck.infra.kinvolk.io) (gcc version 9.3.0 (Gentoo Hardened 9.3.0-r1 p3)) #1 SMP Tue Dec 15 18:31:34 -00 2020
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: mapping:
Jan 12 22:59:58 node3.dimagre.com portworx[2910828]: Setting portmap: 9001
Jan 12 22:59:59 node3.dimagre.com systemd-udevd[2910960]: Using default interface naming scheme 'v245'.
Jan 12 22:59:59 node3.dimagre.com systemd-udevd[2910950]: Using default interface naming scheme 'v245'.
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: sed: can't read /etc/mdadm/mdadm.conf: No such file or directory
Jan 12 22:59:59 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event3 (Power Button)
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Skipping reassembly as no px array config found
Jan 12 22:59:59 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event2 (Power Button)
Jan 12 22:59:59 node3.dimagre.com kernel: **IPVS: rr: UDP 10.1.0.7:53 - no destination available**
Jan 12 22:59:59 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event0 (ServerEngines SE USB Device)
Jan 12 22:59:59 node3.dimagre.com systemd-udevd[2910948]: Using default interface naming scheme 'v245'.
Jan 12 22:59:59 node3.dimagre.com systemd-udevd[2910972]: Using default interface naming scheme 'v245'.
Jan 12 22:59:59 node3.dimagre.com kernel: pxd driver at version: remotes/origin/v2.6.0~1:228c7b5119cd6cddc0c79e3c3bc23eac5ca8d4c9
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: PXD version: 228c7b5119cd6cddc0c79e3c3bc23eac5ca8d4c9
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Checking fs version...
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Module version check: Success
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Done checking fs version...
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: use partitions and all available disks.
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Using cluster: px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Using journal device: auto
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Key Value Store: etcd:http://etcd.dimagre.com:2379
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: Clearing lttng tmpfs location: /var/lib/osd/lttng...
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: ******************************************************************
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: ****** Checking mdraid0 layout path for null **************
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: ******************************************************************
Jan 12 22:59:59 node3.dimagre.com portworx[2910828]: patch_fs already done
Jan 12 23:00:00 node3.dimagre.com portworx[2910828]: Checking sysfs mount...
Jan 12 23:00:00 node3.dimagre.com portworx[2910828]: sysfs on /sys/firmware type sysfs (ro,relatime,seclabel)
Jan 12 23:00:00 node3.dimagre.com portworx[2910828]: sysfs mounted read-only. remounting...
Jan 12 23:00:00 node3.dimagre.com portworx[2910828]: mapping:
Jan 12 23:00:00 node3.dimagre.com portworx[2910828]: Setting portmap: 9001
Jan 12 23:00:00 node3.dimagre.com portworx[2910828]: "bootstrap": true,
Jan 12 23:00:01 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:01,286 CRIT Supervisor running as root (no user in config file)
Jan 12 23:00:01 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:01,290 INFO supervisord started with pid 1
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,294 INFO spawned: 'reboot-diags' with pid 249
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,296 INFO spawned: 'px-nfs' with pid 250
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,298 INFO spawned: 'relayd' with pid 251
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,301 INFO spawned: 'cron' with pid 252
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,303 INFO spawned: 'px-etcd' with pid 253
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,306 INFO spawned: 'lttng' with pid 254
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,309 INFO spawned: 'exec' with pid 255
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,312 INFO spawned: 'cache_flush' with pid 256
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,315 INFO spawned: 'px-diag' with pid 264
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,317 INFO spawned: 'px-healthmon' with pid 270
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,320 INFO spawned: 'pxdaemon' with pid 272
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,322 INFO spawned: 'px-ns' with pid 276
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,325 INFO spawned: 'px_event_listener' with pid 281
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,326 INFO exited: reboot-diags (exit status 0; expected)
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,326 INFO exited: px-nfs (exit status 0; expected)
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:02,326 INFO exited: cache_flush (exit status 0; expected)
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: Tracefile cleanup: Tracing disabled, remove all previous traces...
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: Clean out lttng tmpfs location: ...
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:02Z" level=info msg="px-ns Starting.."
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:02Z" level=info msg="InitPxClient No authentication enabled"
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: Installed NS trace handler for SIGHUP
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: Installed NS sig-handler for SIGUSR1
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: Installed NS sig-handler for SIGUSR2
Jan 12 23:00:02 node3.dimagre.com portworx[2910828]: Starting NS server
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,388 INFO success: relayd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,388 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,388 INFO success: px-etcd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,388 INFO success: lttng entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,388 INFO success: exec entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,389 INFO success: px-diag entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,389 INFO success: px-healthmon entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,389 INFO success: px-ns entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:03 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:03,389 INFO success: px_event_listener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 12 23:00:07 node3.dimagre.com portworx[2910828]: 2021-01-12 23:00:07,361 INFO success: pxdaemon entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
Jan 12 23:00:07 node3.dimagre.com portworx[2910828]: Tracing is disabled, not starting trace processes.
Jan 12 23:00:07 node3.dimagre.com portworx[2910828]: PXPROCS[INFO]: Started px-storage with pid 338
Jan 12 23:00:07 node3.dimagre.com portworx[2910828]: bash: connect: Connection refused
Jan 12 23:00:07 node3.dimagre.com portworx[2910828]: bash: /dev/tcp/localhost/9009: Connection refused
Jan 12 23:00:07 node3.dimagre.com portworx[2910828]: PXPROCS[INFO]: px-storage not started yet...sleeping
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: PXPROCS[INFO]: Started px with pid 349
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: PXPROCS[INFO]: Started watchdog with pid 350
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: 2021-01-12_23:00:10: PX-Watchdog: Starting watcher
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: 2021-01-12_23:00:10: PX-Watchdog: Waiting for px process to start
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: 2021-01-12_23:00:10: PX-Watchdog: (pid 349): Begin monitoring
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="Registering [kernel] as a volume driver"
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="Registered the Usage based Metering Agent...."
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="Setting log level to info(4)"
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="read config from env var" func=init package=boot
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="read config from config.json" func=init package=boot
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="No scheduler hook detected."
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="Alerts initialized successfully for this cluster"
Jan 12 23:00:10 node3.dimagre.com kernel: **IPVS: rr: UDP 10.1.0.7:53 - no destination available**
Jan 12 23:00:10 node3.dimagre.com kernel: **IPVS: rr: UDP 10.1.0.7:53 - no destination available**
Jan 12 23:00:10 node3.dimagre.com kernel: **IPVS: rr: UDP 10.1.0.7:53 - no destination available**
Jan 12 23:00:10 node3.dimagre.com kernel: **IPVS: rr: UDP 10.1.0.7:53 - no destination available**
Jan 12 23:00:10 node3.dimagre.com kernel: **IPVS: rr: UDP 10.1.0.7:53 - no destination available**
Jan 12 23:00:10 node3.dimagre.com kernel: **IPVS: rr: UDP 10.1.0.7:53 - no destination available**
Jan 12 23:00:10 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:10Z" level=info msg="Setting lock timeout to: 3m0s"
Jan 12 23:00:10 node3.dimagre.com systemd-udevd[2911281]: Using default interface naming scheme 'v245'.
Jan 12 23:00:10 node3.dimagre.com systemd-udevd[2911287]: Using default interface naming scheme 'v245'.
Jan 12 23:00:10 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event3 (Power Button)
Jan 12 23:00:10 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event2 (Power Button)
Jan 12 23:00:10 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event0 (ServerEngines SE USB Device)
Jan 12 23:00:10 node3.dimagre.com systemd-udevd[2911286]: Using default interface naming scheme 'v245'.
Jan 12 23:00:10 node3.dimagre.com systemd-udevd[2911280]: Using default interface naming scheme 'v245'.
Jan 12 23:00:11 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:11Z" level=info msg="Node is initialized" func=setNodeInfo package=boot
Jan 12 23:00:11 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:11Z" level=info msg="Using GW interface device:[enp5s0f0]..."
Jan 12 23:00:11 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:11Z" level=info msg="Detected Machine Hardware Type as: unk (Virtual Machine)"
Jan 12 23:00:11 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:11Z" level=info msg="Bootstrapping internal kvdb service." fn=kv-store.New id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
Jan 12 23:00:11 node3.dimagre.com systemd-udevd[2911291]: Using default interface naming scheme 'v245'.
Jan 12 23:00:11 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event3 (Power Button)
Jan 12 23:00:11 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event2 (Power Button)
Jan 12 23:00:11 node3.dimagre.com systemd-udevd[2911272]: Using default interface naming scheme 'v245'.
Jan 12 23:00:11 node3.dimagre.com systemd-logind[797]: Watching system buttons on /dev/input/event0 (ServerEngines SE USB Device)
Jan 12 23:00:12 node3.dimagre.com kernel: BTRFS info (device sda10): setting nodatacow, compression disabled
Jan 12 23:00:12 node3.dimagre.com kernel: BTRFS info (device sda10): disk space caching is enabled
Jan 12 23:00:12 node3.dimagre.com kernel: BTRFS info (device sda10): has skinny extents
Jan 12 23:00:12 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:12Z" level=info msg="Setting up internal kvdb with following parameters: " fn=kv-utils.StartKvdb id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
Jan 12 23:00:12 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:12Z" level=info msg="Initial Cluster Settings: map[8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4:[http://portworx-1.internal.kvdb:9018]]" fn=kv-utils.StartKvdb id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
Jan 12 23:00:12 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:12Z" level=info msg="Kvdb IP: 10.0.0.85 Kvdb PeerPort: 9018 ClientPort: 9019" fn=kv-utils.StartKvdb id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
Jan 12 23:00:12 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:12Z" level=info msg="Kvdb Name: 8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4" fn=kv-utils.StartKvdb id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
Jan 12 23:00:12 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:12Z" level=info msg="Kvdb Cluster State: existing" fn=kv-utils.StartKvdb id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
Jan 12 23:00:12 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:12Z" level=info msg="Kvdb Peer Domain Name: portworx-1.internal.kvdb" fn=kv-utils.StartKvdb id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Registered auditor for kvdb-response"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Registered auditor for kvdb-limits"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="created kv instance" func=initKv package=boot
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Setting lock timeout to: 3m0s"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="creating kvdb metrics wrapper"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="initialized internal kvdb" func=init package=boot
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="initialized osdconfig manager" func=init package=boot
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="pushed config data to kvdb" func=InitAndBoot package=boot
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Starting PX Version: 2.6.1.4-775a586 - Build Version 775a586859e1c3d738cfe9f617d65a388d321f72"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Using GW interface device:[enp5s0f0]..."
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Found the following shared mountpoints: [/var/lib/kubelet /var/lib/osd]"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Node 8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4 with Index (5) is Up"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="node previously initialized:true" func=main package=main
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Configuration Loaded..."
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Cluster ID: px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7 (UUID: \"28dc1639-652c-4933-aa45-f924d144b39d\")"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Node ID: 8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Node Index: 5"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Management Iface: "
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Discovery Server(s): [http://etcd.dimagre.com:2379]"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Storage Type: Devices: [/dev/sda10], Raid Level: data() md()"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX Node Cache Function Attributes=CacheDevices: [], DedicatedCache: false"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Detected hardware type as: VirtualMachine"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Initializing licensing"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="No Trial/Enterprise licenses installed"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Licensing engine initialized using local PX-Essential license."
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license Nodes{count:5,expires:2021-01-13 23:00:15}"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license Volumes{count:500,expires:2021-01-13 23:00:15}"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license VolumeSize{count:1,expires:2021-01-13 23:00:15}"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license NodeCapacity{count:1,expires:2021-01-13 23:00:15}"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license NodeCapacityExtension{count:0,expires:2021-01-13 23:00:15}"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license Snapshots{count:5,expires:2021-01-13 23:00:15}"
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license LocalVolumeAttaches{count:30,expires:2021-01-13 23:00:1
Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license AggregatedVolume{count:0,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license SharedVolume{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license ScaledVolume{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license EncryptedVolume{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license GlobalSecretsOnly{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license ResizeVolume{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license SnapshotToObjectStore{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license SnapshotToObjectStoreDaily{count:1,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license CloudMigration{count:0,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license DisasterRecovery{count:0,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license AUTCapacityManagement{count:0,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license OIDCSecurity{count:0,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license EnablePlatformBare{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license EnablePlatformVM{count:unlimited,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Parsed license HaLevel{count:3,expires:2021-01-13 23:00:15}"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Global license watcher installed." startIdx=161
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX-Essential license configured successfully"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Attempting Secrets Login to Kubernetes Secrets endpoint..."
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX starting cluster manager..."
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="PX cluster manager running."
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Adding cluster event listener: Scheduler"
    Jan 12 23:00:14 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:14Z" level=info msg="Converting VolumeSpecs to SdkStoragePolicyObjects..."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Authentication with Kubernetes Secrets succeeded!"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="PX starting storage..."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Adding cluster event listener: PX Storage Service"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="px-dummy stopped listening..."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="SDK TLS disabled" name=SDK-tcp
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="SDK-tcp gRPC Server ready on [::]:9020"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="SDK TLS disabled" name=SDK-unix
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="SDK-unix gRPC Server ready on /var/lib/osd/driver/pwx.sock"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="SDK gRPC REST Gateway started on port :9021"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Setting concurrent API limit: 20"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Starting server on port: :9001"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="PX API server running on port 9001."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Starting API Server with TLS Disabled."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Starting Watchdog server."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Adding cluster event listener: Kvdb_Cluster_Listener"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Cluster manager starting..."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="initializing osdconfig manager"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Cluster state is OK... Joining the cluster."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Node 8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4 joining cluster..."
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Cluster ID: px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Node Mgmt IP: 10.0.0.85"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Node Data IP: 10.0.0.85"
    Jan 12 23:00:15 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:15Z" level=info msg="Node HWType: VirtualMachine"
    Jan 12 23:00:16 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:16Z" level=info msg="This node participates in quorum decisions"
    Jan 12 23:00:17 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:17Z" level=info msg="Handled update for kvdb node (8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4)" fn=kv-listener.Update id=8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
    Jan 12 23:00:17 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:17Z" level=info msg="Updating proto driver with cluster domain info"
    Jan 12 23:00:17 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:17Z" level=info msg="Cluster manager starting watch at version 171"
    Jan 12 23:00:17 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:17Z" level=info msg="Waiting for the cluster to reach quorum..."
    Jan 12 23:00:17 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:17Z" level=info msg="Starting Gossip... Gossiping to these nodes : [10.0.0.63:9002 10.0.0.168:9002 10.0.0.176:9002 10.0.0.176:9002]"
    Jan 12 23:00:20 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:20Z" level=info msg="gossip: Unable to join other nodes at startup : 4 errors occurred:\n\t* Failed to join 10.0.0.63: dial tcp 10.0.0.63:9002: connect: no route to host\n\t* Failed to join 10.0.0.168: dial tcp 10.0.0.168:9002: connect: connection refused\n\t* Failed to join 10.0.0.176: dial tcp 10.0.0.176:9002: connect: connection refused\n\t* Failed to join 10.0.0.176: dial tcp 10.0.0.176:9002: connect: connection refused\n\n"
    Jan 12 23:00:20 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:20Z" level=info msg="gossip: Adding Node to gossip map: 9ca3418a-8716-4552-8516-d6be206b2e3b"
    Jan 12 23:00:20 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:20Z" level=info msg="gossip: Adding Node to gossip map: 082975fd-c1c7-40c0-b52a-287235748ce7"
    Jan 12 23:00:20 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:20Z" level=info msg="gossip: Adding Node to gossip map: 24a98160-5122-464b-b8d4-b61931b65d41"
    Jan 12 23:00:20 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:20Z" level=info msg="gossip: Adding Node to gossip map: 6ccbf316-5263-4a0b-8b4b-0c8b1a61cc21"
    Jan 12 23:00:26 node3.dimagre.com portworx[2910828]: time="2021-01-12T23:00:26Z" level=info msg="Updated the current set of kvdb endpoints to: [http://10.0.0.85:9019]"

From the logs, I see that at least this node is trying to come up, but it is waiting for quorum no. of nodes to join. Did you start Portworx on the rest of the nodes?

Essentially you need to run the install command and start Portworx on these nodes

10.0.0.63 
10.0.0.168 
10.0.0.176

For ease of communication:

I’ve ran the commands on nodes 1 and 0, but the service is stuck in a reboot cycle like node3 originally:

example:

Jan 12 23:46:32 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:32,741 INFO spawned: 'pxdaemon' with pid 20026
Jan 12 23:46:32 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:32,742 INFO reaped unknown pid 19968
Jan 12 23:46:32 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started px-storage with pid 20064
Jan 12 23:46:32 node1.dimagre.com **portworx** [3926636]: bash: connect: Connection refused
Jan 12 23:46:32 node1.dimagre.com **portworx** [3926636]: bash: /dev/tcp/localhost/9009: Connection refused
Jan 12 23:46:32 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: px-storage not started yet...sleeping
Jan 12 23:46:35 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started px with pid 20076
Jan 12 23:46:35 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started watchdog with pid 20077
Jan 12 23:46:35 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:35: PX-Watchdog: Starting watcher
Jan 12 23:46:35 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:35: PX-Watchdog: Waiting for px process to start
Jan 12 23:46:35 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:35: PX-Watchdog: (pid 20076): Begin monitoring
Jan 12 23:46:36 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:36Z" level=info msg="Registering [kernel] as a volume driver"
Jan 12 23:46:36 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:36Z" level=info msg="Registered the Usage based Metering Agent...."
Jan 12 23:46:36 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:36Z" level=info msg="Setting log level to info(4)"
Jan 12 23:46:36 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:36Z" level=info msg="read config from env var" func=init package=boot
Jan 12 23:46:36 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:36Z" level=info msg="read config from config.json" func=init package=boot
Jan 12 23:46:36 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:36Z" level=info msg="No scheduler hook detected."
Jan 12 23:46:36 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:36Z" level=info msg="Alerts initialized successfully for this cluster"
Jan 12 23:46:38 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:38,127 INFO success: pxdaemon entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
Jan 12 23:46:41 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:41,678 INFO reaped unknown pid 19630
Jan 12 23:46:43 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:43Z" level=error msg="unable to use bootstrap kvdb: context deadline exceeded" func=InitAndBoot package=boot
Jan 12 23:46:43 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:43Z" level=error msg="Could not init boot manager" error="unable to use bootstrap kvdb: context deadline exceeded"
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: px daemon exited with code: 1
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:44,154 INFO exited: pxdaemon (exit status 1; not expected)
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:44,183 INFO spawned: 'pxdaemon' with pid 20120
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:44,184 INFO reaped unknown pid 20064
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started px-storage with pid 20158
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: bash: connect: Connection refused
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: bash: /dev/tcp/localhost/9009: Connection refused
Jan 12 23:46:44 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: px-storage not started yet...sleeping
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started px with pid 20170
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started watchdog with pid 20171
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:47: PX-Watchdog: Starting watcher
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:47: PX-Watchdog: Waiting for px process to start
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:47: PX-Watchdog: (pid 20170): Begin monitoring
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:47Z" level=info msg="Registering [kernel] as a volume driver"
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:47Z" level=info msg="Registered the Usage based Metering Agent...."
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:47Z" level=info msg="Setting log level to info(4)"
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:47Z" level=info msg="read config from env var" func=init package=boot
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:47Z" level=info msg="read config from config.json" func=init package=boot
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:47Z" level=info msg="No scheduler hook detected."
Jan 12 23:46:47 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:47Z" level=info msg="Alerts initialized successfully for this cluster"
Jan 12 23:46:49 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:49,525 INFO success: pxdaemon entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
Jan 12 23:46:53 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:53,079 INFO reaped unknown pid 19727
Jan 12 23:46:54 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:54Z" level=error msg="unable to use bootstrap kvdb: context deadline exceeded" func=InitAndBoot package=boot
Jan 12 23:46:54 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:54Z" level=error msg="Could not init boot manager" error="unable to use bootstrap kvdb: context deadline exceeded"
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: px daemon exited with code: 1
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:55,554 INFO exited: pxdaemon (exit status 1; not expected)
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:55,587 INFO spawned: 'pxdaemon' with pid 20215
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:46:55,588 INFO reaped unknown pid 20158
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started px-storage with pid 20253
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: bash: connect: Connection refused
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: bash: /dev/tcp/localhost/9009: Connection refused
Jan 12 23:46:55 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: px-storage not started yet...sleeping
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started px with pid 20265
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: PXPROCS[INFO]: Started watchdog with pid 20266
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:58: PX-Watchdog: Starting watcher
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:58: PX-Watchdog: Waiting for px process to start
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: 2021-01-12_23:46:58: PX-Watchdog: (pid 20265): Begin monitoring
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:58Z" level=info msg="Registering [kernel] as a volume driver"
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:58Z" level=info msg="Registered the Usage based Metering Agent...."
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:58Z" level=info msg="Setting log level to info(4)"
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:58Z" level=info msg="read config from env var" func=init package=boot
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:58Z" level=info msg="read config from config.json" func=init package=boot
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:58Z" level=info msg="No scheduler hook detected."
Jan 12 23:46:58 node1.dimagre.com **portworx** [3926636]: time="2021-01-12T23:46:58Z" level=info msg="Alerts initialized successfully for this cluster"
Jan 12 23:47:00 node1.dimagre.com **portworx** [3926636]: 2021-01-12 23:47:00,937 INFO success: pxdaemon entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

Can node0 and node1 reach this etcd endpoint ?

etcd:http://etcd.dimagre.com:2379

yes they can reach it.

From the logs it looks like Portworx cannot perform get/put operations with that etcd endpoint.

  • Lets double check the etcd endpoint was provided correctly to the rest of the nodes.
  • Check if you can run the following command from node0 and node1
    curl -L http://etcd.dimagre.com:2379/health
**node1** **dgrekov #** curl -L http://etcd.dimagre.com:2379/health
{"health":"true"} 
**node1** **dgrekov #**

and

**node0** **dgrekov #** curl -L http://etcd.dimagre.com:2379/health
{"health":"true"} 
**node0** **dgrekov #**

Ok that etcd looks healthy. I am not sure why PX is having difficulty talking to it.

Lets try this:

  1. Download etcdctl on one of those nodes
curl -L https://storage.googleapis.com/etcd/v3.3.1/etcd-v3.3.1-linux-amd64.tar.gz -o /tmp/etcd-v3.3.1-linux-amd64.tar.gz
cd /tmp 
tar -xzvf /tmp/etcd-v3.3.1-linux-amd64.tar.gz
  1. Check if you can get/put
ETCDCTL_API=3 ./etcdctl --endpoints http://etcd.dimagre.com:2379 get --prefix pwx/

Also can you paste contents of /etc/pwx/config.json from one of those nodes?

ok, now we’re getting somewhere:

**node0** **etcd-v3.3.1-linux-amd64 #** ETCDCTL_API=3 ./etcdctl --endpoints http://etcd.dimagre.com:2379 get --prefix pwx
Error: context deadline exceeded
**node0** **etcd-v3.3.1-linux-amd64 #** ETCDCTL_API=3 ./etcdctl --endpoints http://etcd.dimagre.com:2379 get --prefix pwx --debug
ETCDCTL_CACERT=
ETCDCTL_CERT=
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=2s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[http://etcd.dimagre.com:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=
ETCDCTL_USER=
ETCDCTL_WATCH_KEY=
ETCDCTL_WATCH_RANGE_END=
ETCDCTL_WRITE_OUT=simple
INFO: 2021/01/13 00:21:11 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc4201d87e0
INFO: 2021/01/13 00:21:11 dialing to target with scheme: ""
INFO: 2021/01/13 00:21:11 could not get resolver for scheme: ""
INFO: 2021/01/13 00:21:11 balancerWrapper: is pickfirst: false
INFO: 2021/01/13 00:21:11 balancerWrapper: got update addr from Notify: [{etcd.dimagre.com:2379 <nil>}]
INFO: 2021/01/13 00:21:11 ccBalancerWrapper: new subconn: [{etcd.dimagre.com:2379 0 <nil>}]
INFO: 2021/01/13 00:21:11 balancerWrapper: handle subconn state change: 0xc42012e800, CONNECTING
INFO: 2021/01/13 00:21:11 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4201d87e0
WARNING: 2021/01/13 00:21:13 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {etcd.dimagre.com:2379 0 <nil>}
WARNING: 2021/01/13 00:21:13 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {etcd.dimagre.com:2379 0 <nil>}
WARNING: 2021/01/13 00:21:13 Failed to dial etcd.dimagre.com:2379: grpc: the connection is closing; please retry.
WARNING: 2021/01/13 00:21:13 Failed to dial etcd.dimagre.com:2379: grpc: the connection is closing; please retry.
Error: context deadline exceeded

while on node3

**node3** **etcd-v3.3.1-linux-amd64 #** ETCDCTL_API=3 ./etcdctl --endpoints http://etcd.dimagre.com:2379 get --prefix pwx
pwx/px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7//bootstrap_entries
[{"IP":"10.0.0.85","ID":"8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4","Index":0,"State":2,"Type":1,"Ts":"2021-01-13T00:12:48.818161703Z","Version":"v2","peerport":"9018","clientport":"9019","Domain":"portworx-1.internal.kvdb","DataDirType":"BtrfsSubvolume"}]
pwx/px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7/testConnection
200

The three nodes are physically next to each other, very interesting. their config is supposed to be identical.

Ok, progress:

**node0** **etcd-v3.3.1-linux-amd64 #** ETCDCTL_API=3 ./etcdctl --endpoints http://etcd.dimagre.com:2379 get --prefix pwx --dial-timeout=15s --command-timeout=20s --debug
ETCDCTL_CACERT=
ETCDCTL_CERT=
ETCDCTL_COMMAND_TIMEOUT=20s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=15s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[http://etcd.dimagre.com:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=
ETCDCTL_USER=
ETCDCTL_WATCH_KEY=
ETCDCTL_WATCH_RANGE_END=
ETCDCTL_WRITE_OUT=simple
INFO: 2021/01/13 00:39:54 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc420226c60
INFO: 2021/01/13 00:39:54 dialing to target with scheme: ""
INFO: 2021/01/13 00:39:54 could not get resolver for scheme: ""
INFO: 2021/01/13 00:39:54 balancerWrapper: is pickfirst: false
INFO: 2021/01/13 00:39:54 balancerWrapper: got update addr from Notify: [{etcd.dimagre.com:2379 <nil>}]
INFO: 2021/01/13 00:39:54 ccBalancerWrapper: new subconn: [{etcd.dimagre.com:2379 0 <nil>}]
INFO: 2021/01/13 00:39:54 balancerWrapper: handle subconn state change: 0xc420128ee0, CONNECTING
INFO: 2021/01/13 00:39:54 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc420226c60
INFO: 2021/01/13 00:39:59 balancerWrapper: handle subconn state change: 0xc420128ee0, READY
INFO: 2021/01/13 00:39:59 clientv3/balancer: pin "etcd.dimagre.com:2379"
INFO: 2021/01/13 00:39:59 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc420226c60
INFO: 2021/01/13 00:39:59 balancerWrapper: got update addr from Notify: [{etcd.dimagre.com:2379 <nil>}]
pwx/px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7//bootstrap_entries
[{"IP":"10.0.0.85","ID":"8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4","Index":0,"State":2,"Type":1,"Ts":"2021-01-13T00:32:56.813419327Z","Version":"v2","peerport":"9018","clientport":"9019","Domain":"portworx-1.internal.kvdb","DataDirType":"BtrfsSubvolume"}]
pwx/px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7/testConnection
200

that worked with a longer timeout, can I change that in some envars for Portworx?

We don’t have a setting to tune that dial timeout for etcd connections. Do you know what could be the reason for this n/w delay?

To go forward, we could add node0 and node1 to that etcd cluster, so that PX will talk to the local etcd running on that node and we can avoid the n/w delay.

I figured out the delay though not really the fix yet. … the primary DNS was on that k8s cluster that failed so it was trying that lookup first, timing out and trying again. I’ve changed the hosts entry and DHCP settings to remove that issue, now the etcd cli command looks like this:

*node0** **etcd-v3.3.1-linux-amd64 #** ETCDCTL_API=3 ./etcdctl --endpoints http://etcd.dimagre.com:2379 get --prefix pwx/ --debug
ETCDCTL_CACERT=
ETCDCTL_CERT=
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=2s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[http://etcd.dimagre.com:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=
ETCDCTL_USER=
ETCDCTL_WATCH_KEY=
ETCDCTL_WATCH_RANGE_END=
ETCDCTL_WRITE_OUT=simple
INFO: 2021/01/13 01:55:14 ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc4201e06c0
INFO: 2021/01/13 01:55:14 dialing to target with scheme: ""
INFO: 2021/01/13 01:55:14 could not get resolver for scheme: ""
INFO: 2021/01/13 01:55:14 balancerWrapper: is pickfirst: false
INFO: 2021/01/13 01:55:14 balancerWrapper: got update addr from Notify: [{etcd.dimagre.com:2379 <nil>}]
INFO: 2021/01/13 01:55:14 ccBalancerWrapper: new subconn: [{etcd.dimagre.com:2379 0 <nil>}]
INFO: 2021/01/13 01:55:14 balancerWrapper: handle subconn state change: 0xc42013abf0, CONNECTING
INFO: 2021/01/13 01:55:14 ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc4201e06c0
INFO: 2021/01/13 01:55:14 balancerWrapper: handle subconn state change: 0xc42013abf0, READY
INFO: 2021/01/13 01:55:14 clientv3/balancer: pin "etcd.dimagre.com:2379"
INFO: 2021/01/13 01:55:14 ccBalancerWrapper: updating state and picker called by balancer: READY, 0xc4201e06c0
INFO: 2021/01/13 01:55:14 balancerWrapper: got update addr from Notify: [{etcd.dimagre.com:2379 <nil>}]
pwx/px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7//bootstrap_entries
[{"IP":"10.0.0.85","ID":"8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4","Index":0,"State":2,"Type":1,"Ts":"2021-01-13T01:53:29.675125682Z","Version":"v2","peerport":"9018","clientport":"9019","Domain":"portworx-1.internal.kvdb","DataDirType":"BtrfsSubvolume"}]
pwx/px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7/testConnection
200

that being said, node0 is still failing with:

Jan 13 01:56:45 node0.dimagre.com portworx[31117]: time="2021-01-13T01:56:45Z" level=error msg="unable to use bootstrap kvdb: context deadline exceeded" func=InitAndBoot package=boot
Jan 13 01:56:45 node0.dimagre.com portworx[31117]: time="2021-01-13T01:56:45Z" level=error msg="Could not init boot manager" error="unable to use bootstrap kvdb: context deadline exceeded"

Whoooo!!!

dgrekov@node3 ~ $ sudo /opt/pwx/bin/pxctl status
Status: PX is operational
License: PX-Essential (ERROR: License is expired, UserID/PX-Central endpoint is incorrect)
Node ID: 8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4
	IP: 10.0.0.85 
 	Local Storage Pool: 1 pool
	POOL	IO_PRIORITY	RAID_LEVEL	USABLE	USED	STATUS	ZONE	REGION
	0	LOW		raid0		183 GiB	25 GiB	Online	default	default
	Local Storage Devices: 1 device
	Device	Path		Media Type		Size		Last-Scan
	0:1	/dev/sda10	STORAGE_MEDIUM_MAGNETIC	183 GiB		13 Jan 21 02:02 UTC
	* Internal kvdb on this node is sharing this storage device /dev/sda10  to store its data.
	total		-	183 GiB
	Cache Devices:
	 * No cache devices
Cluster Summary
	Cluster ID: px-cluster-19235250-e738-4df6-b6e3-be7a12a535b7
	Cluster UUID: "28dc1639-652c-4933-aa45-f924d144b39d"
	Scheduler: none
	Nodes: 5 node(s) with storage (3 online)
	IP		ID					SchedulerNodeName	StorageNode	Used		Capacity	Status	StorageStatus	Version		Kernel			OS
	10.0.0.168	9ca3418a-8716-4552-8516-d6be206b2e3b	N/A			Yes		29 GiB		509 GiB		Online	Up		2.6.1.4-775a586	5.4.83-flatcar		Flatcar Container Linux by Kinvolk 2605.10.0 (Oklo)
	10.0.0.85	8c43a1ae-c3fe-4802-a6da-d6b24a55b8a4	N/A			Yes		25 GiB		183 GiB		Online	Up (This node)	2.6.1.4-775a586	5.4.83-flatcar		Flatcar Container Linux by Kinvolk 2605.10.0 (Oklo)
	10.0.0.63	6ccbf316-5263-4a0b-8b4b-0c8b1a61cc21	node2.dimagre.com	Yes		Unavailable	Unavailable	Offline	Down		2.6.1.4-775a586	5.4.72-flatcar		Flatcar Container Linux by Kinvolk 2605.7.0 (Oklo)
	10.0.0.176	24a98160-5122-464b-b8d4-b61931b65d41	node0.dimagre.com	Yes		Unavailable	Unavailable	Offline	Down		2.5.2.0-176ddb7	4.19.128-flatcar	Flatcar Container Linux by Kinvolk 2512.2.1 (Oklo)
	10.0.0.176	082975fd-c1c7-40c0-b52a-287235748ce7	N/A			Yes		0 B		0 B		Online	Up		2.6.1.4-775a586	5.4.83-flatcar		Flatcar Container Linux by Kinvolk 2605.10.0 (Oklo)
	Warnings: 
		 WARNING: Internal Kvdb is not using dedicated drive on nodes [10.0.0.85]. This configuration is not recommended for production clusters.
		 WARNING: Cluster consists of nodes with different PX versions.
Global Storage Pool
	Total Used    	:  54 GiB
	Total Capacity	:  692 GiB
    dgrekov@node3 ~ $ 

Apparently Portworx has it’s own resolv.conf with the same bad DNS entry, removing it fixes the issue!!!

/opt/pwx/oci/mounts/etc/resolv.conf

Ok, I will let the cluster settle and try to mount the volumes.