feat(orch): support rootfs resizing when building from template#2406
feat(orch): support rootfs resizing when building from template#2406
Conversation
Previously, building a template from another template (FromTemplate) always reused the source template's build artifacts as-is, with no ability to change the disk size. The rootfs size was baked into the header at base layer creation time and copied identically through every subsequent diff generation. Since resize2fs requires a local ext4 file, there was no way to resize the diff-chain-based virtual block device. This commit modifies the FromTemplate path in the base build phase to support disk resizing by materializing the source template's rootfs into a new ext4 file: 1. The source template's rootfs is loaded from storage via NBD (same mechanism as cmd/mount-build-rootfs) and mounted read-only. 2. A fresh ext4 filesystem is created at the target size. 3. All files are copied from source to destination via rsync. 4. Fresh provisioning files (envd, busybox, provision script, system configs) are written to overwrite any stale versions from the source template. 5. The ext4 is shrunk, resized to the target DiskSizeMB, and integrity-checked. 6. The sandbox is re-provisioned (systemd install via busybox init). 7. A new snapshot layer is created (systemd boot, pause, upload) -- same flow as buildLayerFromOCI. The FromTemplate Layer() method now checks the hash-based layer cache (which already includes DiskSizeMB in its hash key) instead of always returning Cached: true. Repeated builds with the same from-template + disk-size combination remain cached. Structural changes: - Extract NBD utilities (Cleaner, BuildDevice, TemplateRootfs, GetNBDDevice, MountNBDDevice) from pkg/sandbox/nbd/testutils into a new production package pkg/sandbox/nbd/nbdutil. The testutils package now re-exports from nbdutil for backward compatibility. Test-only types (ZeroDevice, LoggerOverlay) remain in testutils. - Export ProvisioningFiles() from pkg/template/build/core/rootfs to allow both the OCI path (as OCI tar layers) and the new template path (as direct disk writes) to share the same file set. The existing additionalOCILayers() now delegates to ProvisioningFiles() internally. - Add template_rootfs.go in the base build phase containing: buildLayerFromTemplate(), materializeTemplateRootfs(), copyFilesRsync(), and writeProvisioningFiles().
| return os.RemoveAll(diffCacheDir) | ||
| }) | ||
|
|
||
| flags, err := featureflags.NewClient() |
There was a problem hiding this comment.
featureflags.NewClient() creates a real LaunchDarkly LDClient (when LAUNCH_DARKLY_API_KEY is set) whose goroutines and connections are never released. featureflags.Client has a Close(ctx) error method - a corresponding cleaner.Add step is needed to avoid leaking one LD client per buildLayerFromTemplate invocation.
| return os.RemoveAll(diffCacheDir) | ||
| }) | ||
|
|
||
| flags, err := featureflags.NewClient() |
There was a problem hiding this comment.
featureflags.NewClient() creates a real LaunchDarkly LDClient (when LAUNCH_DARKLY_API_KEY is set) whose goroutines and connections are never released. featureflags.Client has a Close(ctx context.Context) error method — add a corresponding cleaner.Add step to avoid leaking one LD client per buildLayerFromTemplate invocation.
|
|
||
| cleaner.Add(func(cleanupCtx context.Context) error { | ||
| <-poolClosed | ||
|
|
There was a problem hiding this comment.
<-poolClosed blocks unconditionally without selecting on cleanupCtx.Done(). If devicePool.Populate is slow to respond to context cancellation, this step hangs indefinitely and the 30-second timeout in Cleaner.Run has no effect on it.
| defer span.End() | ||
|
|
||
| // We use a separate context for NBD operations to avoid cleanup deadlocks on cancellation | ||
| nbdCtx := context.Background() |
There was a problem hiding this comment.
context.Background() means a cancelled build context (user abort, deadline) will not propagate to mnt.Open or devicePool.Populate, potentially leaving the build appearing stuck. context.WithoutCancel(ctx) would be a better choice here — it preserves trace/value propagation while still isolating from cancellation-induced cleanup deadlocks.
| } | ||
|
|
||
| // Remove existing file/symlink if present | ||
| os.Remove(fullLinkPath) |
There was a problem hiding this comment.
os.Remove(fullLinkPath) silently discards its error. If the path exists but cannot be removed (e.g. it is a non-empty directory or a permissions issue), the error is swallowed and the subsequent os.Symlink fails with EEXIST, obscuring the real cause. Only os.IsNotExist should be ignored; other errors should be returned.
Previously, building a template from another template (FromTemplate) always reused the source template's build artifacts as-is, with no ability to change the disk size. The rootfs size was baked into the header at base layer creation time and copied identically through every subsequent diff generation. Since resize2fs requires a local ext4 file, there was no way to resize the diff-chain-based virtual block device.
This commit modifies the FromTemplate path in the base build phase to support disk resizing by materializing the source template's rootfs into a new ext4 file:
The FromTemplate Layer() method now checks the hash-based layer cache (which already includes DiskSizeMB in its hash key) instead of always returning Cached: true. Repeated builds with the same from-template + disk-size combination remain cached.
Structural changes:
Extract NBD utilities (Cleaner, BuildDevice, TemplateRootfs, GetNBDDevice, MountNBDDevice) from pkg/sandbox/nbd/testutils into a new production package pkg/sandbox/nbd/nbdutil. The testutils package now re-exports from nbdutil for backward compatibility. Test-only types (ZeroDevice, LoggerOverlay) remain in testutils.
Export ProvisioningFiles() from pkg/template/build/core/rootfs to allow both the OCI path (as OCI tar layers) and the new template path (as direct disk writes) to share the same file set. The existing additionalOCILayers() now delegates to ProvisioningFiles() internally.
Add template_rootfs.go in the base build phase containing: buildLayerFromTemplate(), materializeTemplateRootfs(), copyFilesRsync(), and writeProvisioningFiles().