Skip to content

fix: isolate build cache per build-uuid to prevent EOF errors#148

Merged
vigneshrajsb merged 3 commits intomainfrom
fix/per-uuid-build-cache
Mar 27, 2026
Merged

fix: isolate build cache per build-uuid to prevent EOF errors#148
vigneshrajsb merged 3 commits intomainfrom
fix/per-uuid-build-cache

Conversation

@vigneshrajsb
Copy link
Copy Markdown
Contributor

@vigneshrajsb vigneshrajsb commented Mar 26, 2026

Summary

  • Appends buildUuid (environment identifier, e.g., dev-0, wandering-credit-924200) to the buildkit cache ref for non-ECR registries (distribution), so each environment/PR gets its own isolated cache entry
  • Previously, all builds of the same servic shared a single :cache tag, causing concurrent write races that corrupted cache manifests
  • This corruption led to ~40 buildkit unexpected EOF errors per 2 days across builds

Before: distribution.../repo/service-name:cache (shared across all PRs)
After: distribution.../repo/service-name/build-uuid:cache (isolated per environment)

Changes

  • Add buildUuid to NativeBuildOptions, passed from deploy.build.uuid
  • Replace two-step cache ref patching with single appendCacheRefSegments(cacheRef, serviceName, buildUuid)
  • When buildUuid is undefined, falls back to service-name-only cache ref (backward compatible)

Trade-offs

  • First build per new environment will be a cold cache (no pre-existing cache to import)
  • Subsequent builds in the same environment will hit their own isolated cache

Test plan

  • Existing buildkit tests updated and passing (24/24)
  • New test for buildUuid:undefined fallback
  • Lint passes
  • No new TypeScript errors (all ts-check errors are pre-existing)
  • Deploy to staging and verify builds create uuid-scoped cache refs in distribution registry
  • Monitor for EOF errors over 48h to confirm reduction

…ite corruption

Multiple concurrent builds of the same service (e.g., 130 open PRs for
next-web) were all reading/writing the same :cache tag in the distribution
registry, causing race conditions that corrupted cache manifests and led
to buildkit "unexpected EOF" errors during COPY --from steps.

Appends deployUuid to the cache ref for non-ECR registries so each
environment/PR gets its own isolated cache entry.

Before: distribution.../next-web/service-name:cache (shared)
After:  distribution.../next-web/service-name/deploy-uuid:cache (per-env)
@vigneshrajsb vigneshrajsb requested a review from a team as a code owner March 26, 2026 23:41
- Add buildUuid to NativeBuildOptions and pass build.uuid from deploy.ts
- Replace two-step cache ref patching with single appendCacheRefSegments()
  that builds the full path (serviceName + buildUuid) in one call
- Cache ref: distribution.../repo/service-name/build-uuid:cache

Before: all PRs shared one cache ref per service (concurrent write races)
After: each environment gets its own isolated cache ref
- Fix stale comment referencing deployUuid instead of buildUuid
- Replace duplicate test with buildUuid:undefined fallback test
- Add inline comment explaining why cache is scoped per build-uuid
@vigneshrajsb vigneshrajsb changed the title fix: isolate build cache per deploy-uuid to prevent EOF errors fix: isolate build cache per build-uuid to prevent EOF errors Mar 27, 2026
@vigneshrajsb vigneshrajsb merged commit 887d73d into main Mar 27, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant