kzzz
August 6, 2024, 6:56pm
1
Detailed Description
I used Portworx as a distributed storage for container storage interface (CSI) drivers. When I upgraded my CAPI based cluster, it caused “repaving” — replacing old nodes in the cluster one by one with new nodes that have the new desired state in place. When worker machines were repaving, CAPI continuously kill px-<cluster-name> pods, which caused CSI plugin never getting re-registered and the node hanging forever. Therefore, cluster upgrades got stuck. Since CAPI drain nodes based on --ignore-daemonsets --delete-emptydir-data --force, is it possible for Portworx to make px-<cluster-name> pods be deployed via Daemonset?
opened 05:29PM - 06 Aug 24 UTC
closed 12:33PM - 30 Sep 24 UTC
kind/feature
priority/important-soon
triage/accepted
### What would you like to be added (User Story)?
As a developer, I would lik… e to be able filter out certain pods (certain pods won't be deleted) during the node draining process.
### Detailed Description
I used Portworx as a distributed storage for container storage interface (CSI) drivers. When I upgraded my cluster, it caused “repaving” — replacing old nodes in the cluster one by one with new nodes that have the new desired state in place. When worker machines were repaving, CAPI continuously kill `px-<cluster-name>` pods, which caused CSI plugin never getting re-registered and the node hanging forever. Therefore, cluster upgrades got stuck.
In this case, if we can have a field in machine spec to filter out the pods that we don't want CAPI to delete during the node draining process, then pods like `px-<cluster-name>` can be re-registered and repaving will be done successfully. As discussion, we may need to not delete pods that have such toleration.
```
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
```
```go
var (
unreachableToleration = corev1.Toleration{
Key: nodeUnreachableKey,
Effect: corev1.TaintEffectNoSchedule,
Operator: corev1.TolerationOpExists,
}
)
```
```go
drainer := []kubedrain.PodFilter{
SkipFuncGenerator(m.Spec.NodeDrainPodFilters),
},
```
```go
func skipUnreachableTolerationPods(pod corev1.Pod) kubedrain.PodDeleteStatus {
if pod.Spec.Tolerations == nil {
return kubedrain.MakePodDeleteStatusOkay()
}
if HasTolerations(&pod, &unreachableToleration) {
return kubedrain.MakePodDeleteStatusSkip()
}
return kubedrain.MakePodDeleteStatusOkay()
}
```
With helper function
```go
func HasTolerations(pod *corev1.Pod, toleration *corev1.Toleration) bool {
for _, t := range pod.Spec.Tolerations {
if t.MatchToleration(toleration) {
return true
}
}
return false
}
```
<del>We may add a field called `NodeDrainPodFilters` in `MachineSpec`. We can also add this field in `KubeadmControlPlaneTemplateMachineTemplate` struct</del>
<del>NodeDrainPodFilters *metav1.LabelSelector `json:"nodeDrainPodFilters,omitempty"`</del>
### Anything else you would like to add?
_No response_
### Label(s) to be applied
/kind feature