From 4795c6fcb862e1730e3810a006db57d74be46dbf Mon Sep 17 00:00:00 2001 From: Komh Date: Wed, 22 Apr 2026 23:33:18 +0800 Subject: [PATCH] [storage] Resolving Multi-Attach Errors for ReadWriteOnce PersistentVolumes --- ...ors_for_ReadWriteOnce_PersistentVolumes.md | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 docs/en/solutions/Resolving_Multi_Attach_Errors_for_ReadWriteOnce_PersistentVolumes.md diff --git a/docs/en/solutions/Resolving_Multi_Attach_Errors_for_ReadWriteOnce_PersistentVolumes.md b/docs/en/solutions/Resolving_Multi_Attach_Errors_for_ReadWriteOnce_PersistentVolumes.md new file mode 100644 index 00000000..06bdf694 --- /dev/null +++ b/docs/en/solutions/Resolving_Multi_Attach_Errors_for_ReadWriteOnce_PersistentVolumes.md @@ -0,0 +1,100 @@ +--- +kind: + - Troubleshooting +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Issue + +A pod stays in `ContainerCreating` with events similar to: + +```text +Warning FailedAttachVolume ... attachdetach-controller + Multi-Attach error for volume "pvc-abc" + Volume is already exclusively attached to one node and can't be attached to another +``` + +The workload appears healthy on its original node, but a new replica, a drained pod, or a replacement cannot start anywhere else. + +## Root Cause + +A PersistentVolume declared with `accessModes: [ReadWriteOnce]` can be published to **one** node at a time. The cluster tracks this ownership through a `VolumeAttachment` object created by the CSI attach/detach controller. When the controller observes a second pod trying to consume the same PVC on a different node before the first attachment is released, it blocks the second attach and surfaces the `Multi-Attach` error. + +Common ways this state arises: + +- A node became `NotReady` while a pod was running on it. The kubelet on that node can no longer confirm detach, so the `VolumeAttachment` is not cleaned up. +- A Deployment using `RollingUpdate` strategy has `maxSurge > 0` while its PVC is `RWO` — the replacement pod is scheduled before the old one releases the volume. +- A pod was force-deleted (`--grace-period=0 --force`) while still attached. The API object is gone, but the CSI driver never received a detach call. +- Workload topology drifted so two replicas that share a `RWO` PVC land on different nodes. + +## Resolution + +1. **Confirm the access mode and controller type.** If the volume is truly single-writer and the workload runs multiple replicas, it is the workload that needs to change, not the volume: + + ```bash + kubectl get pvc -o jsonpath='{.spec.accessModes}' + kubectl get deployment -o jsonpath='{.spec.strategy}' + ``` + + For a single-writer workload use `strategy: Recreate` (or a StatefulSet with one replica). For a multi-writer workload, request `ReadWriteMany` and pick a CSI driver that supports it. + +2. **If the original node is gone or stuck**, remove the stale attachment so the volume can bind elsewhere: + + ```bash + kubectl get volumeattachment -o wide | grep + kubectl describe volumeattachment + ``` + + If the holder node is permanently offline and will not come back, delete the `VolumeAttachment`; the CSI controller will re-reconcile. Never delete the `VolumeAttachment` while the holder node is healthy — it will race with the in-flight attach and can corrupt the volume. Drain the node first: + + ```bash + kubectl cordon + kubectl drain --ignore-daemonsets --delete-emptydir-data + # only after the node is drained / confirmed dead: + kubectl delete volumeattachment + ``` + +3. **If the holder pod is terminating-but-stuck**, find it and let the kubelet finish detach. Avoid `--force`: + + ```bash + kubectl get pod -A -o wide | grep + kubectl describe pod -n + # check kubelet logs on the node for unmount failures + ``` + +4. **Prevent recurrence.** Encode the single-writer invariant in the manifests: `strategy: Recreate` on Deployments, `podAntiAffinity` to keep replicas together when shared RWO is required, and where possible move data to a shared-filesystem provisioner (e.g. a CSI driver that exposes `ReadWriteMany`). + +## Diagnostic Steps + +Inspect the PersistentVolume and PVC to confirm access mode and reclaim policy: + +```bash +kubectl describe pv +kubectl describe pvc -n +``` + +List all VolumeAttachments and identify the one holding the PV: + +```bash +kubectl get volumeattachment -o custom-columns='NAME:.metadata.name,ATTACHED:.status.attached,NODE:.spec.nodeName,PV:.spec.source.persistentVolumeName' \ + | grep +``` + +Confirm the holder node is reachable and has not expired its lease: + +```bash +kubectl get node -o wide +kubectl get lease -n kube-node-lease -o yaml +``` + +Check events on the pod, the volume, and the node: + +```bash +kubectl -n describe pod +kubectl get events -A --field-selector involvedObject.kind=VolumeAttachment +kubectl get events -A --field-selector involvedObject.name= +``` + +If the CSI driver exposes metrics, scrape `csi_operations_seconds{operation="ControllerUnpublishVolume"}` — a spike usually indicates the driver is the bottleneck rather than Kubernetes.