From a0c37a1f3fb2b1ac9ff5db243d95a0414c6cd2d1 Mon Sep 17 00:00:00 2001 From: Komh Date: Wed, 22 Apr 2026 23:32:31 +0800 Subject: [PATCH] [configure] Pods Pending with Insufficient Memory While Nodes Show Free Memory --- ...ent_Memory_While_Nodes_Show_Free_Memory.md | 89 +++++++++++++++++++ 1 file changed, 89 insertions(+) create mode 100644 docs/en/solutions/Pods_Pending_with_Insufficient_Memory_While_Nodes_Show_Free_Memory.md diff --git a/docs/en/solutions/Pods_Pending_with_Insufficient_Memory_While_Nodes_Show_Free_Memory.md b/docs/en/solutions/Pods_Pending_with_Insufficient_Memory_While_Nodes_Show_Free_Memory.md new file mode 100644 index 0000000..dae2a89 --- /dev/null +++ b/docs/en/solutions/Pods_Pending_with_Insufficient_Memory_While_Nodes_Show_Free_Memory.md @@ -0,0 +1,89 @@ +--- +kind: + - Troubleshooting +products: + - Alauda Container Platform +ProductsVersion: + - 4.1.0,4.2.x +--- +## Issue + +A pod stays in `Pending` state with a scheduling error similar to: + +```text +0/6 nodes are available: + 3 Insufficient memory, + 3 node(s) had taint {node-role.kubernetes.io/control-plane:}, that the pod didn't tolerate. +``` + +`kubectl top node` reports plenty of free memory on the same workers, and the new pod only requests `500Mi`. Yet the scheduler refuses to place it. + +## Root Cause + +The scheduler does not compare a pod's request to the node's **real-time** memory usage. It compares it to the node's **allocatable** budget minus the **sum of requests** of every pod already admitted. Once that accounting pool is exhausted, the node is full from the scheduler's perspective — even if running pods use far less than they asked for. + +This is by design. `.spec.containers[].resources.requests` acts as a reservation: the kubelet guarantees the pod can consume up to that amount without being throttled or OOM-killed relative to lower-priority workloads. Admitting a new pod that would push total requests beyond allocatable would break that guarantee for everyone already on the node. + +`kubectl top` reports current utilization through the metrics pipeline. It is the right tool for capacity investigations and the wrong one for reasoning about scheduling — the two numbers are computed from different inputs and will legitimately disagree whenever pods request more than they currently use. + +## Resolution + +Right-size requests first; add hardware second. + +1. **Audit request vs. actual usage** for the high-request pods on the saturated nodes. If a pod reserves `4Gi` but its 7-day P95 working set is `900Mi`, the reservation is wrong — lower it. A modest over-provision factor (typically 1.3× to 1.5× of P95) is a reasonable rule of thumb for stable workloads. + +2. **Separate requests from limits intentionally.** Setting `requests == limits` (Guaranteed QoS) consumes the most capacity. Most workloads are better served by `requests` sized for the P95 steady state and `limits` sized for the peak, placing them in Burstable QoS. Only infrastructure components that must never be OOM-killed need the Guaranteed tier. + +3. **Use `LimitRange` to catch regressions.** A namespace-level default with reasonable ceilings keeps a single team from accidentally reserving an entire node: + + ```yaml + apiVersion: v1 + kind: LimitRange + metadata: + name: default-requests + namespace: team-a + spec: + limits: + - type: Container + default: { cpu: "500m", memory: "512Mi" } + defaultRequest: { cpu: "100m", memory: "128Mi" } + max: { cpu: "4", memory: "8Gi" } + ``` + +4. **Scale the cluster only after requests are honest.** Adding worker nodes to cover inflated requests simply relocates the waste. When additional capacity is genuinely required, a HorizontalPodAutoscaler for elastic services and a node autoscaler for the fleet are cheaper than permanently over-provisioning both. + +## Diagnostic Steps + +Compare allocatable memory to the sum of requests on each node: + +```bash +kubectl describe node | sed -n '/Allocated resources/,/Events/p' +``` + +The table at the bottom lists requests/limits per resource and the percentage of allocatable already reserved. + +Find the top memory reservers on a suspect node: + +```bash +node= +kubectl get pods -A -o json \ + --field-selector spec.nodeName=$node \ +| jq -r '.items[] | .spec.containers[] as $c + | [.metadata.namespace, .metadata.name, $c.name, + ($c.resources.requests.memory // "0")] | @tsv' \ +| sort -k4 -h | column -t +``` + +Compare those numbers with real usage from the metrics pipeline: + +```bash +kubectl top pod -A --containers | sort -k5 -h | tail -20 +``` + +Inspect the pending pod's events to confirm the exact predicate that rejected each node: + +```bash +kubectl describe pod -n +``` + +If the imbalance is across a single deployment, look for a missing anti-affinity rule that is concentrating replicas on the already-saturated node.