Skip to content

perf(storage): zero-copy upload path from mmap to GCS#2394

Draft
ValentaTomas wants to merge 9 commits intomainfrom
perf/zero-copy-upload
Draft

perf(storage): zero-copy upload path from mmap to GCS#2394
ValentaTomas wants to merge 9 commits intomainfrom
perf/zero-copy-upload

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented Apr 14, 2026

Upload memfile/rootfs directly from mmap to GCS, skipping CachePath→StoreFile→ReadAt→alloc-per-chunk. Stacked on #2393.

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 14, 2026

PR Summary

Medium Risk
Touches core storage write paths and introduces new mmap lifetime/locking expectations, so bugs could cause upload failures or memory/lifecycle issues; changes are contained to upload/write code and have test updates, but impact multiple backends (GCS/AWS/FS/cache).

Overview
Switches template memfile/rootfs uploads from path-based StoreFile to an in-memory Store([]byte) flow, adding Data() accessors on mmap-backed caches/diffs/chunkers so uploads can use the existing mmap buffer (with a release function to prevent unmapping during use) and refactoring GCS multipart uploads to operate on byte slices instead of reopening/reading files; updates the seekable writer interface and implementations/mocks accordingly, and increases sudo test timeouts in CI to reduce flakiness.

Reviewed by Cursor Bugbot for commit 24f35a8. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d84ad827b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/shared/pkg/storage/storage_aws.go Outdated
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placeholder

Comment thread packages/orchestrator/pkg/sandbox/build/local_diff.go Outdated
Comment thread packages/shared/pkg/storage/gcp_multipart.go
Comment thread packages/shared/pkg/storage/storage_aws.go Outdated
@ValentaTomas ValentaTomas changed the base branch from perf/minimal-multipart-fix to main April 14, 2026 22:26
@linear
Copy link
Copy Markdown

linear bot commented Apr 14, 2026

@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch 6 times, most recently from 9a3ab85 to f451748 Compare April 14, 2026 23:18
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go
Add StoreData(ctx, []byte) to the storage interface and wire up the
template build upload path to use it. Memfile/rootfs uploads now go
directly from mmap-backed diff data to HTTP, skipping the previous
CachePath→StoreFile→ReadAt→alloc-per-chunk pipeline.
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch from c158470 to 00efb6a Compare April 14, 2026 23:51
Comment thread packages/orchestrator/pkg/sandbox/template_build.go
@ValentaTomas ValentaTomas assigned jakubno and unassigned dobrac Apr 15, 2026
The dataStorer type assertion in uploadDiff always failed because
OpenSeekable returns a cachedSeekable or peerSeekable wrapper, not
the inner gcpObject that implements StoreData. This made the zero-copy
upload path unreachable, always falling back to StoreFile.

Add StoreData forwarding to cachedSeekable and peerSeekable so the
type assertion succeeds through the wrapper chain.

Also release the mmap lock (from diff.Data) before falling back to
StoreFile, since that path reads from disk and doesn't use the mmap.
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/shared/pkg/storage/storage_google.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/block/cache.go Outdated
}

// StoreData forwards to the inner storage provider for zero-copy upload from mmap'd data.
func (c *cachedSeekable) StoreData(ctx context.Context, data []byte) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we ever want to start saving data to outer layer (NFS cache) this path is missing that

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the experiment with this didn't much work out as the NFS cache can't scale in these situations, so we will probably solve that differently.

Comment thread packages/shared/pkg/storage/storage_cache_seekable.go Outdated
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template/peerclient/seekable.go Outdated
- Extract tryZeroCopyUpload helper so defer release() works safely
  without risking deadlock on the StoreFile fallback path
- Fix zero-size diff: when Data() returns nil, fall through to
  StoreFile instead of silently skipping the upload
- Unify dataStorer interface as storage.DataStorer in one place
- Fix gcsOperationAttr labels in StoreData (WriteFromData instead
  of WriteFromFileSystem)
- Include concrete type in assertion error messages for debugging
- Nit: remove redundant []byte() cast on mmap in cache.Data()
Comment thread packages/orchestrator/pkg/sandbox/template_build.go Outdated
- Skip upload entirely when diff is *NoDiff (zero-size file), fixing
  the regression where NoDiff.CachePath() returned NoDiffError on the
  StoreFile fallback path
- Add StoreDataViaFile fallback in cachedSeekable and peerSeekable so
  StoreData gracefully degrades via a temp file + StoreFile when the
  inner storage doesn't implement DataStorer, instead of returning
  an error
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/template_build.go
Comment thread packages/shared/pkg/storage/storage_cache_seekable_test.go
Comment thread packages/shared/pkg/storage/storage_cache_seekable.go
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch 2 times, most recently from 24d42d2 to f7a9540 Compare April 16, 2026 07:25
@ValentaTomas ValentaTomas marked this pull request as draft April 16, 2026 07:30
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f7a9540. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/block/chunk.go
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch 4 times, most recently from 4eeaa48 to 7d1cdf9 Compare April 16, 2026 07:39
…ap upload

Replace StoreFile(ctx, path) with Store(ctx, []byte) across all storage
backends. The mmap-backed diff data now flows directly to GCS multipart
upload as sub-slices, avoiding the previous CachePath→file open→ReadAt→
alloc-per-chunk pipeline.

This merges the three-function multipart upload path
(UploadFileInParallel + uploadParts + UploadData) into a single
Upload(ctx, []byte, concurrency) method, and eliminates the separate
DataStorer interface and StoreData methods on all wrappers.
@ValentaTomas ValentaTomas force-pushed the perf/zero-copy-upload branch from 7d1cdf9 to a43759c Compare April 16, 2026 07:39
The orchestrator smoke test runs Firecracker VMs that need network
access. Without an explicit timeout, Go's default 10m applies, which
is too short when the VM has network delays.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants