Skip to content

refactor(block): simplify Chunker to single-block Block#2404

Draft
levb wants to merge 3 commits intolev-compression-finalfrom
lev-compression-simplify-slice
Draft

refactor(block): simplify Chunker to single-block Block#2404
levb wants to merge 3 commits intolev-compression-finalfrom
lev-compression-simplify-slice

Conversation

@levb
Copy link
Copy Markdown
Contributor

@levb levb commented Apr 15, 2026

No description provided.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review with inline comments on seekable.go and storage_diff.go

Comment thread packages/orchestrator/pkg/sandbox/template/peerserver/seekable.go
Comment thread packages/orchestrator/pkg/sandbox/build/storage_diff.go Outdated

// P2P always serves uncompressed bytes — pass nil FrameTable.
data, err := f.diff.Slice(ctx, offset, length, nil)
data, err := f.diff.Block(ctx, offset, nil)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 seekableSource.Stream now calls diff.Block(ctx, offset, nil) which always returns exactly one block (min(blockSize, size-offset) bytes), completely ignoring the length parameter. In production the peer client requests MemoryChunkSize (~4 MB) per chunk, but Stream delivers only one 4 KB block; the Chunker on the receiving side calls io.ReadFull expecting 4 MB and gets io.ErrUnexpectedEOF, breaking every P2P uncompressed diff read. The fix is to loop over Block() calls until length bytes have been served, or reintroduce a Slice(ctx, offset, length, ft) path.

Extended reasoning...

What the bug is and how it manifests

Before this PR, seekableSource.Stream called diff.Slice(ctx, offset, length, nil), which returned exactly length bytes from the diff. After the refactor it calls diff.Block(ctx, offset, nil), which always returns min(blockSize, size-offset) bytes — exactly one block, regardless of the length argument. The length parameter received by Stream is now completely unused.

The specific code path that triggers it

  1. The peer client (peerclient/seekable.go) calls OpenRangeReader with length = MemoryChunkSize (~4 MB for uncompressed diffs).
  2. This triggers a gRPC ReadAtBuildSeekable RPC with Length=4 MB.
  3. The gRPC server (chunks.go:152) calls src.Stream(ctx, offset, length=4 MB, sender).
  4. Stream calls diff.Block(ctx, offset, nil) which returns min(blockSize=4 KB, size-offset) bytes — one block.
  5. Only 4 KB is written to the gRPC stream before it closes.
  6. On the client side, peerStreamReader returns io.EOF after delivering those 4 KB.
  7. Chunker.progressiveRead calls io.ReadFull(reader, mmapSlice[0:4 MB]), gets 4 KB then io.ErrUnexpectedEOF.
  8. totalRead (4 KB) < fetchLen (4 MB), so the trailing-EOF guard does not trigger and an error is returned, failing the entire chunk fetch.

Why existing tests don't catch it

TestSeekableSource_Stream uses data=[]byte("diff bytes") (10 bytes) and sets BlockSize()=len(data)=10, making BlockSize == len(data) == requested length. In this degenerate case Block() happens to return all 10 bytes, matching what Slice() would have returned. In production blockSize=4 KB and requested lengths are up to 4 MB, so the mismatch is always triggered.

Impact

Every P2P uncompressed diff read where the requested length exceeds one block size fails with io.ErrUnexpectedEOF. Since MemoryChunkSize (4 MB) >> blockSize (4 KB), this is every such read in production, completely breaking P2P uncompressed diff fetches.

How to fix it

Stream needs to loop over Block() calls to serve the full requested length, advancing offset by blockSize each iteration until length bytes have been delivered. Alternatively, a dedicated Slice(ctx, offset, length, ft) method returning an arbitrary byte range should be reintroduced on the Diff interface.

Step-by-step proof

  • blockSize = 4096 (4 KB), MemoryChunkSize = 410241024 (4 MB)
  • Peer client: OpenRangeReader(ctx, off=0, length=4194304)
  • gRPC server: Stream(ctx, offset=0, length=4194304, sender)
  • Stream: data, _ := diff.Block(ctx, 0, nil) → returns 4096 bytes (one block)
  • gRPC stream sends 4096 bytes then closes
  • Client io.ReadFull(reader, buf[0:4194304]) → n=4096, err=io.ErrUnexpectedEOF
  • totalRead=4096 < fetchLen=4194304 → error returned
  • Result: chunk fetch fails, sandbox P2P transfer is broken

levb added 2 commits April 15, 2026 13:22
The Chunker is always called with exactly one block at a time: UFFD
sends page-sized reads, NBD's Overlay splits multi-block requests.
Codify this invariant and simplify the interface:

- Replace FramedSlicer/Slice with Blocker/Block — returns exactly one
  block, computes min(blockSize, remaining) internally, no length param
- Remove unused FramedReader interface and Chunker.ReadAt (zero callers)
- Validate offset alignment and bounds at Block() entry
- Add fetchSession.contains() for session lookup
- Rename chunkOff/chunkLen → fetchOff/fetchLen, locateChunk → getFetchRange
Diff now exposes only Block (via Blocker) as its data-read primitive.
Multi-block reads are composed via the new block.ReadBlocks helper.

The previous commit replaced Slice with Block but missed that P2P
Stream called Slice with length up to 4MB (MemoryChunkSize), not just
one block. Stream now loops over Block calls to serve the full
requested range. Added TestSeekableSource_Stream_MultiBlock to cover
this case.
@levb levb force-pushed the lev-compression-simplify-slice branch from e3130bf to 967d757 Compare April 15, 2026 20:22
@levb
Copy link
Copy Markdown
Contributor Author

levb commented Apr 15, 2026

@claude re-review

@levb levb marked this pull request as draft April 15, 2026 20:29
IterBlocks is the single block-iteration primitive. ReadBlocks fills
a buffer via IterBlocks. P2P Stream passes sender.Send directly as
the callback — the signatures match exactly (func([]byte) error).
@levb levb force-pushed the lev-compression-simplify-slice branch from 9524878 to 9b77dde Compare April 15, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants