feat: improve cse bootstrap latency by deferring non-critical work#8105
feat: improve cse bootstrap latency by deferring non-critical work#8105awesomenix wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces Linux CSE bootstrap critical-path work by deferring non-essential steps until after ensureKubelet, and updates generated pkg/agent/testdata snapshots to reflect the new CSE/custom data output.
Changes:
- Defers
ensureNoDupOnPromiscuBridge,enableLocalDNS, and non-GPU driver cleanup until afterensureKubeletincse_main.sh. - Optimizes provisioning/runtime setup by switching kube binary activation to
mv+chmod, and reloading only a targeted sysctl file instead ofsysctl --system. - Updates VHD cleanup to disable
containerdand regeneratespkg/agent/testdataCustomData snapshots.
Reviewed changes
Copilot reviewed 18 out of 75 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/cleanup-vhd.sh |
Disables containerd during VHD cleanup to avoid shipping images with it enabled. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defers some non-critical steps until after ensureKubelet; skips container runtime install for golden images/OSGuard. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Changes kubelet/kubectl “activation” to mv + chmod to avoid redundant copy work. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service. |
pkg/agent/testdata/MarinerV2+Kata/CustomData |
Regenerated snapshot for updated CSE/custom data output. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot for updated CSE/custom data output. |
You can also share your feedback on Copilot code review. Take the survey.
| mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl | ||
|
|
||
| chmod a+x /opt/bin/kubelet /opt/bin/kubectl |
There was a problem hiding this comment.
this was what was before, keeping it as is
There was a problem hiding this comment.
why this change ? I'm not understanding, install was cleaner ? but slower ?
There was a problem hiding this comment.
also, why not force the access level ?
There was a problem hiding this comment.
also curious about both
There was a problem hiding this comment.
install does a copy and not a move.
Operation: It copies the file to the destination. A key difference from cp is that install unlinks (removes) the destination file first if it already exists, which can prevent issues (like an EBUSY error) when replacing a running executable.
There was a problem hiding this comment.
i kept the operation as is before chewi made the change to avoid regression, not sure if it was better or worse but just guarenteed to work and no regression.
There was a problem hiding this comment.
Regression? My change was merged two months ago. There are important reasons to use install over cp, including the one stated above. There are cases where the destination will be an existing symlink, and it is crucial that we replace the symlink, not its target. mv will do that, but I can't remember if there was some other reason why I didn't stick with mv.
7e6264c to
192e020
Compare
192e020 to
c95a094
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux CSE bootstrap critical-path latency by deferring non-critical steps until after ensureKubelet, avoiding redundant work (targeted sysctl reload, moving kube binaries), and adjusting VHD build/runtime behaviors around containerd.
Changes:
- Reorders CSE provisioning steps so kubelet starts earlier; starts
kubeletbeforemeasure-tls-bootstrapping-latency.service. - Optimizes provisioning work (targeted
sysctl -p,mv+chmodfor kube binaries, skip runtime install when golden image already contains it). - Adjusts VHD build scripts/tests to ensure containerd is started when needed and disabled during image cleanup; regenerates
pkg/agent/testdata.
Reviewed changes
Copilot reviewed 18 out of 77 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| vhdbuilder/packer/trivy-scan.sh | Sources provision helpers and ensures containerd is started before Trivy operations. |
| vhdbuilder/packer/test/linux-vhd-content-test.sh | Starts containerd before executing VHD validation tests. |
| vhdbuilder/packer/cleanup-vhd.sh | Disables containerd during VHD cleanup. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Defers non-critical steps until after ensureKubelet; skips container runtime install on golden images. |
| parts/linux/cloud-init/artifacts/cse_install.sh | Uses mv + chmod when activating downloaded kubelet/kubectl. |
| parts/linux/cloud-init/artifacts/cse_config.sh | Uses targeted sysctl -p and starts kubelet before the TLS bootstrapping latency measurement service. |
| pkg/agent/testdata/MarinerV2+Kata/CustomData | Regenerated snapshot output for MarinerV2+Kata CustomData. |
| pkg/agent/testdata/CustomizedImage/CustomData | Regenerated snapshot output for CustomizedImage CustomData. |
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux CSE bootstrap critical-path latency by deferring non-essential provisioning work until after ensureKubelet, avoiding some redundant work, and adjusting TLS bootstrapping latency measurement to start timing immediately before kubelet start.
Changes:
- Reordered
cse_main.shoperations to start kubelet earlier (deferring LocalDNS, promiscuous bridge dedupe, and non-GPU driver cleanup). - Updated TLS bootstrapping latency measurement to use a start-time file written just before kubelet start, and adjusted ShellSpec coverage accordingly.
- Tweaked VHD build/test scripts to (re-)enable/start containerd for scanning/tests and disable it during VHD cleanup; regenerated
pkg/agent/testdata.
Reviewed changes
Copilot reviewed 19 out of 80 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/trivy-scan.sh |
Sources provisioning helpers and ensures containerd is started before trivy scans. |
vhdbuilder/packer/test/linux-vhd-content-test.sh |
Starts containerd before running VHD content validations. |
vhdbuilder/packer/cleanup-vhd.sh |
Disables containerd at image cleanup time. |
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh |
Updates ShellSpec expectations for new start-time-file-based TLS measurement behavior. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot testdata for updated custom data/CSE output. |
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh |
Uses a persisted start time (written pre-kubelet) and emits completion events even on fast-start/race scenarios. |
parts/linux/cloud-init/artifacts/cse_start.sh |
Adds ScriptlessMode datapoint into GA event message payload. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defers non-critical steps until after ensureKubelet; skips container runtime install on golden images. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Switches kube binary activation from install to mv + chmod. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Uses targeted sysctl -p and writes TLS start time immediately before starting kubelet; starts kubelet before the measurement service. |
You can also share your feedback on Copilot code review. Take the survey.
| systemctlEnableAndStart containerd 30 || exit 4 | ||
|
|
| systemctl daemon-reload | ||
| systemctl disable --now containerd |
| WATCH_TIMEOUT_SECONDS=${WATCH_TIMEOUT_SECONDS:-300} # default to 5 minutes | ||
|
|
||
| createGuestAgentEvent() { | ||
| local task=$1; startTime=$2; endTime=$3; |
this is weird, we just moved to using install. is this a side effect of AI ? cc: @chewi to comment, since he's the one who migrated to using |
| @@ -144,6 +145,8 @@ else | |||
| exit 1 | |||
| fi | |||
|
|
|||
| systemctlEnableAndStart containerd 30 || exit 4 | |||
There was a problem hiding this comment.
this feels weird. it's because we start containerd in the CSE ? normally we just used to reload it no ?
There was a problem hiding this comment.
Yeah we start it during CSE
djsly
left a comment
There was a problem hiding this comment.
How are we saving 20sec exactly ? is it a combination of all those changes ? or there is 1 major change ?
| @@ -473,8 +461,21 @@ function nodePrep { | |||
|
|
|||
| logs_to_events "AKS.CSE.ensureKubelet" ensureKubelet | |||
There was a problem hiding this comment.
can we teach AI in the PR that it should report when changes are introduced before ensure Kubelet which will have direct impact on nodebootstrapping time ?
| fi | ||
|
|
||
| if [ "${SHOULD_ENABLE_LOCALDNS}" = "true" ]; then | ||
| logs_to_events "AKS.CSE.enableLocalDNS" enableLocalDNS || exit $ERR_LOCALDNS_FAIL |
There was a problem hiding this comment.
does kubelet and running pods require localDNS ? what is kubelet start, node registers and local dns isn't ready ?
There was a problem hiding this comment.
Nope, also previously it was just a start and not waiting for local dns to be online
| logs_to_events "AKS.CSE.ensureKubelet" ensureKubelet | ||
|
|
||
| if [ "${ENSURE_NO_DUPE_PROMISCUOUS_BRIDGE}" = "true" ]; then | ||
| logs_to_events "AKS.CSE.ensureNoDupOnPromiscuBridge" ensureNoDupOnPromiscuBridge |
There was a problem hiding this comment.
weird that the service file has
[Unit]
Description=Add dedup ebtable rules for kubenet bridge in promiscuous mode
After=containerd.service
After=kubelet.service
[Service]
systemctlEnableAndStart ensure-no-dup 30 would run before, and simply wanted to auto start in case either containderd or kubelet starts.
in this case, I'm not sure if calling the enable/start will have an effect, not that we actually start containd/kubelet before ?
| mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl | ||
|
|
||
| chmod a+x /opt/bin/kubelet /opt/bin/kubectl |
There was a problem hiding this comment.
why this change ? I'm not understanding, install was cleaner ? but slower ?
| mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl | ||
|
|
||
| chmod a+x /opt/bin/kubelet /opt/bin/kubectl |
There was a problem hiding this comment.
also, why not force the access level ?
| EOF | ||
| retrycmd_if_failure 120 5 25 sysctl --system || exit $ERR_SYSCTL_RELOAD | ||
| # ensureSysctl occurs after this, we already call sysctl --system in ensureSysctl calling it here is a waste | ||
| retrycmd_if_failure 120 5 25 sysctl -p /etc/sysctl.d/99-force-bridge-forward.conf || exit $ERR_SYSCTL_RELOAD |
There was a problem hiding this comment.
how much time are we saving here ? could we hide an issue here where directly loading this file work well, but there is a bug in the final load order ?
Rule of thumb
Debugging one value → sysctl -p
Apply persistent config correctly → ✅ sysctl --system
Simulate boot exactly → systemctl restart systemd-sysctl
There was a problem hiding this comment.
not true, we have a step ensureSysctl which does exactly this, performing it in containerd step is useless
6e1ae0b to
2fc37be
Compare
e2e/config/vhd.go
Outdated
| Arch: "arm64", | ||
| Distro: datamodel.AKSAzureLinuxV3Arm64Gen2, | ||
| Gallery: imageGalleryLinux, | ||
| Flatcar: true, |
There was a problem hiding this comment.
ugh! copy paste error.
e2e/scenario_test.go
Outdated
|
|
||
| func Test_AzureLinuxV3_ARM64(t *testing.T) { | ||
| RunScenario(t, &Scenario{ | ||
| Description: "Tests that a node using a Flatcar VHD on ARM64 architecture can be properly bootstrapped", |
There was a problem hiding this comment.
shouldn't this be Tests that a node using an AzureLinuxV3 VHD on ARM64 architecture can be properly bootstrapped
|
|
||
| logs_to_events "AKS.CSE.ensureKubelet" ensureKubelet | ||
|
|
||
| if [ "${ENSURE_NO_DUPE_PROMISCUOUS_BRIDGE}" = "true" ]; then |
There was a problem hiding this comment.
so, just note that these changes aren't improving CSE latency right, they're just improving node registration latency (which is good don't get me wrong, though from the RP side, the operation should end up taking the same amount of time since RP will block synchronously on the CRP call to create / update the VM/VMSS, which is solely dependent on CSE execution time, not node registration time)
There was a problem hiding this comment.
That would be half true, since due to reordering some startup/components, the dependent components start faster hence a faster overall CSE finish. But at end of the day we are still at mercy of CSE.
But you are right, we are purely focussed on Node registration speed up
|
|
| fi | ||
| fi | ||
| install -m0755 "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| install -m0755 "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl |
There was a problem hiding this comment.
yeah, wondering why install would be taking longer, if the correct version of kubelet/kubectl is cached on the VHD it should just move it to /opt/bin, no?
| fi | ||
|
|
||
| # start measure-tls-bootstrapping-latency.service without waiting for the main process to start, while ignoring any failures | ||
| if ! systemctlEnableAndStartNoBlock measure-tls-bootstrapping-latency 30; then |
There was a problem hiding this comment.
is this really adding a lot of latency? systemctlEnableAndStartNoBlock explicitly does not block, meaning systemd doesn't wait for the unit to enter a running state before returning
mainly asking since moving this below ensureKubelet seems to add a fair bit of complexity
|
|
Not true, we actually restart during our CSE provisioning after we drop the config, so instead of performing double restarts and possibly slower since a restart is slower than a start. |
2fc37be to
d8a4b88
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux node provisioning (CSE) bootstrap latency by moving non-critical work off the kubelet startup critical path, and by trimming some redundant/expensive operations during provisioning and VHD validation.
Changes:
- Reorders portions of the Linux CSE flow so kubelet starts earlier, and updates TLS bootstrapping latency measurement to use a start-time file written immediately before kubelet startup.
- Optimizes certain provisioning steps (targeted
sysctl -p,mv+chmodfor kube binaries) and adjusts VHD/Packer scripts to manage containerd state for scanning/tests/cleanup. - Adds an Azure Linux V3 Gen2 ARM64 E2E image definition and scenario, and regenerates
pkg/agent/testdataoutputs.
Reviewed changes
Copilot reviewed 20 out of 82 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/trivy-scan.sh |
Sources provisioning helpers and explicitly enables/starts containerd before running Trivy scans. |
vhdbuilder/packer/test/linux-vhd-content-test.sh |
Reloads systemd and restarts containerd at test start. |
vhdbuilder/packer/cleanup-vhd.sh |
Disables containerd during VHD cleanup to avoid shipping it enabled. |
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh |
Updates ShellSpec coverage for TLS bootstrapping latency measurement behavior and the new start-time file logic. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot testdata for updated custom data/CSE outputs. |
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh |
Adds configurable start-time filepath and emits completion events based on a pre-kubelet start timestamp. |
parts/linux/cloud-init/artifacts/cse_start.sh |
Adds a ScriptlessMode datapoint to the guest agent event message payload. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defers non-critical steps until after ensureKubelet; skips container runtime install when full install isn’t required. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Uses mv + chmod when activating downloaded kubelet/kubectl binaries. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Uses targeted sysctl -p for bridge-forwarding config; writes TLS bootstrapping start time before starting kubelet; starts the measurement service after kubelet. |
e2e/scenario_test.go |
Adds an Azure Linux V3 Gen2 ARM64 E2E scenario. |
e2e/config/vhd.go |
Adds an Azure Linux V3 Gen2 ARM64 image definition for E2E. |
You can also share your feedback on Copilot code review. Take the survey.
| createGuestAgentEvent() { | ||
| local task=$1; startTime=$2; endTime=$3; | ||
| local eventsFileName=$(date +%s%3N) | ||
| local eventsFileName | ||
| eventsFileName=$(date +%s%3N) |
|
|
||
| func Test_AzureLinuxV3_ARM64(t *testing.T) { | ||
| RunScenario(t, &Scenario{ | ||
| Description: "Tests that a node using a AzureLinuxV3 VHD on ARM64 architecture can be properly bootstrapped", |
d8a4b88 to
0cd55f9
Compare
0cd55f9 to
6e10943
Compare
There was a problem hiding this comment.
Pull request overview
This PR reduces Linux CSE bootstrap critical-path work by moving non-essential steps later, adds more precise TLS bootstrapping latency measurement, and updates test artifacts/config to reflect the new behavior.
Changes:
- Defer non-critical CSE steps until after
ensureKubeletand adjust containerd/sysctl handling to reduce redundant work. - Start kubelet before the TLS-bootstrapping-latency measurement service, using a start-time file to preserve the latency signal.
- Add/adjust tests and testdata, including a new AzureLinux V3 ARM64 e2e scenario and regenerated
pkg/agent/testdata.
Reviewed changes
Copilot reviewed 21 out of 83 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
vhdbuilder/packer/trivy-scan.sh |
Source provisioning helpers and ensure containerd is started before scanning. |
vhdbuilder/packer/test/linux-vhd-content-test.sh |
Restart containerd before running VHD content tests. |
vhdbuilder/packer/cleanup-vhd.sh |
Disable containerd during VHD cleanup. |
spec/parts/linux/cloud-init/artifacts/measure_tls_bootstrapping_latency_spec.sh |
Update ShellSpec coverage for new TLS bootstrapping start-time behavior and race handling. |
pkg/agent/testdata/CustomizedImage/CustomData |
Regenerated snapshot testdata reflecting updated custom data/CSE output. |
parts/linux/cloud-init/artifacts/measure-tls-bootstrapping-latency.sh |
Use start-time file, emit completion event even for fast/racy kubeconfig creation, and improve quoting. |
parts/linux/cloud-init/artifacts/kubelet.service |
Add pre-start wait for containerd socket. |
parts/linux/cloud-init/artifacts/cse_start.sh |
Emit scriptless-mode datapoint in the guest agent event message. |
parts/linux/cloud-init/artifacts/cse_main.sh |
Defer non-critical steps until after kubelet startup; skip container runtime install on golden images. |
parts/linux/cloud-init/artifacts/cse_install.sh |
Activate kube binaries via mv + chmod rather than install. |
parts/linux/cloud-init/artifacts/cse_config.sh |
Use targeted sysctl -p, start containerd non-blocking, and write TLS bootstrap start-time before starting kubelet. |
e2e/scenario_test.go |
Add AzureLinux V3 ARM64 scenario. |
e2e/config/vhd.go |
Add e2e image config for AzureLinux V3 ARM64 Gen2. |
You can also share your feedback on Copilot code review. Take the survey.
| ExecStartPre=-/sbin/iptables -t nat --numeric --list | ||
|
|
||
| ExecStartPre=/bin/bash /opt/azure/containers/validate-kubelet-credentials.sh | ||
| ExecStartPre=/bin/sh -c 'until [ -S /run/containerd/containerd.sock ]; do sleep 0.1; done' |
| mv "/opt/bin/kubelet-${KUBERNETES_VERSION}" /opt/bin/kubelet | ||
| mv "/opt/bin/kubectl-${KUBERNETES_VERSION}" /opt/bin/kubectl | ||
|
|
||
| chmod a+x /opt/bin/kubelet /opt/bin/kubectl |
| systemctl daemon-reload && systemctl restart containerd | ||
|
|
|
|
||
| func Test_AzureLinuxV3_ARM64(t *testing.T) { | ||
| RunScenario(t, &Scenario{ | ||
| Description: "Tests that a node using a AzureLinuxV3 VHD on ARM64 architecture can be properly bootstrapped", |
6e10943 to
9c85db1
Compare
Summary
ensureKubeletsysctl -p, moving kube binaries instead of copying them, and skipping container runtime install when the golden image already has itpkg/agent/testdatato capture the updated CSE/custom data outputWhat changed
ensureNoDupOnPromiscuBridge,enableLocalDNS, and non-GPU driver cleanup later incse_main.shso kubelet startup happens earlierkubeletbeforemeasure-tls-bootstrapping-latency.serviceinstallwithmvpluschmodwhen activating downloadedkubelet/kubectlbinaries/etc/sysctl.d/99-force-bridge-forward.confinstead of runningsysctl --systemcontainerdduring VHD cleanup so the image does not carry it enabled prematurelyValidation
Timings
BeforeAfter