Change Portworx Node Type / Node Pool Migration with zero downtime

The document explains on how to change the node/instance types / node pool or migrate node pools on cloud or VMware.

Example Scenarios:

  1. You have 3 storage nodes in your PX cluster (EC2 instance type = t2.micro) and want to replace each of them (EC2 instance type = t3.micro)

  2. You have a mix of VMs and physical boxes that were using 100GB disks for Portworx and want to re-initialise Portworx with larger capacity disks on each node OR replace Physical Boxes with VMs completely.

There are 2 ways to achieve this:

  1. You add the 3 new nodes first to the cluster and then evacuate the older 3 nodes in cluster. This will require you to request additional license for extra nodes making it a total of 6 nodes.
  2. You evacuate the nodes in the cluster 1 by 1 and replacing it with new node every time. Thus keeping the total number of nodes in the cluster always as same (3 nodes).
  • Evacuation of the node here means moving all PX Volume Replicas to another OR replacement node and deletion of Snapshots from a node.
  • This also means you will be draining the nodes and cordoning them so no application pods can be scheduled there.

Brief Steps:
a. Evacuate the node and decommission it.
b. Let the new replacement node join the PX cluster and start moving back the volume replicas.

  • Evacuation of the node here means moving all PX Volume Replicas to another OR replacement node and deletion of Snapshots from a node.
  • This also means you will be draining the nodes and cordoning them so no application pods can be scheduled there.

Node Evacuation:

Before proceeding,

  • You must ensure that you drain the node and ensure no PX volumes are attached on the node. You may need to scale down the applications.
  • You must ensure there are always at least 2 replicas of the PX volumes.- This is to ensure there is no Data Loss and your applications can continue to run on other nodes.

mount | grep pxd // Should return no PXD devices mounted

Step-1: To check if the node has any volumes use the following command:

pxctl status // This will give you list of all the nodes with Node-Id in the 2nd column
pxctl volume list --node <node-id obtained in above step>

Step-2: To evacuate the node, you need to move (reduce) Volume Replicas and delete any local snapshots present on the node.
Please refer following link for Replica Move or Add operations.

Step-3: Label the node as px/enabled=false, stop the Portworx service systemctl stop portworx finally run pxctl sv nodewipe --all
-The above step will ensure that Portworx is not installed again by Portworx Daemonset.

Replacement Node:
Moving back Volume Replicas:

1 Like