Pods fail to attach or detach volumes
By: Date: 26/10/2021 Categories: VMware Tags:

When using the vSphere CSI driver in a multi cluster Tanzu Kubernetes Grid Integrated Edition (TKGI) environment, pods start failing to attach or detach volumes. 

Cause

The vSphere CSI driver uses the cluster-id for the volume create spec. If there are multiple kubernetes clusters in the same vSphere using the same cluster-id, each time one of the kubernetes clusters syncs to the vCenter it will tag or untag the volumes in vCenter. This in turn causes the volumes not to attach or detach as they should. Impact / RisksIt can take up to two hours to change cluster-ids if multiple clusters are using the same value. During that time volumes cannot be managed and new volumes cannot be created. ResolutionDeploy each vSphere CSI driver with a unique cluster-id.
 Workaround

For already existing clusters it is possible to change the cluster-id. One of the clusters will have to keep the original cluster-id. 

  1. Choose one kubrernetes cluster to keep the original cluster-id value.
  2. Modify the CSI driver deployment such that the replica count is 0 on all kubernetes clusters except for the one you selected in Step 1.
  3. Wait at least one hour to allow other volumes to de-register.
  4. Change the cluster-id value in all other kubernetes clusters except for the one you selected in Step 1.
  5. Modify the CSI driver deployment such that the replica count is 1 on all kubernetes clusters except for the one you selected in Step 1.
  6. Wait at least one hour to allow volumes to get re-registered.

 Related Information

This is only applicable to TKGI 1.10 and lower as vSphere Cloud Native Storage is integrated into TKGI in version 1.11 and higher. See Cloud Native Storage (CNS) on vSphere for more information.