Skip to content

fix(sqlite-native): keep truncate cache coherent#4637

Open
NathanFlurry wants to merge 1 commit into04-12-fix_sqlite-vfs_use_delete_range_for_truncate_cleanupfrom
04-12-fix_sqlite-native_keep_truncate_cache_coherent
Open

fix(sqlite-native): keep truncate cache coherent#4637
NathanFlurry wants to merge 1 commit into04-12-fix_sqlite-vfs_use_delete_range_for_truncate_cleanupfrom
04-12-fix_sqlite-native_keep_truncate_cache_coherent

Conversation

@NathanFlurry
Copy link
Copy Markdown
Member

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Copy Markdown
Member Author

NathanFlurry commented Apr 13, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link
Copy Markdown

claude bot commented Apr 13, 2026

Code Review: PR #4637fix(sqlite-native): keep truncate cache coherent

Summary

This PR fixes two bugs in the kv_io_truncate function in the native Rust SQLite VFS:

  1. Off-by-one in the read-cache eviction formula: The old code computed truncate_from_chunk as (size / CHUNK_SIZE) + 1 and kept chunks with chunk_idx < truncate_from_chunk. When size is exactly a multiple of CHUNK_SIZE (e.g., 4096), this incorrectly retained the next chunk that was just deleted, leaking stale data into the cache.

  2. Missing cache update for the truncated boundary chunk: After writing the shortened boundary chunk to KV, the old code never updated the in-memory read cache. On the next read, the cache would serve the old, longer version of that chunk, causing reads past the new logical file end to see stale bytes.

Both fixes are logically correct. The PR also reorganises variable declarations to compute last_chunk_to_keep and last_existing_chunk earlier so the cache-eviction block can use a consistent formula.


What Is Correct

Fix 1 — Cache eviction boundary (off-by-one).

The new formula matches the formula used everywhere else in this function and in the WASM counterpart (Math.floor((size - 1) / CHUNK_SIZE)). The old formula kept one too many chunks whenever size was a multiple of CHUNK_SIZE.

Fix 2 — Read-cache update for the truncated boundary chunk.

After the shortened chunk is written to KV, the PR inserts the new, shorter Vec into the read cache only after the KV write succeeds, keeping the two stores coherent. Correct.


Issues

1. Cache eviction runs before the failable kv_put (Low)

The read-cache retain call now happens before file.size is updated and before the kv_put that writes the new metadata. If kv_put fails, file.size and file.meta_dirty are rolled back — but the evicted cache entries are not restored.

After this error path, file.size is the old size, so subsequent reads will try to read chunks that the cache no longer holds. They fall through to a fresh KV fetch, so this does not cause corruption — the chunks are still in KV. But it is inconsistent with the rest of the function's error-handling pattern.

Moving the retain call to after the successful kv_put would make the error-handling contract consistent:

// ... compute last_chunk_to_keep / last_existing_chunk ...

let previous_size = file.size;
let previous_meta_dirty = file.meta_dirty;
file.size = size;
file.meta_dirty = true;
if ctx.kv_put(...).is_err() {
    file.size = previous_size;
    file.meta_dirty = previous_meta_dirty;
    return SQLITE_IOERR_TRUNCATE;
}
file.meta_dirty = false;

// Evict AFTER metadata write succeeds
if let Some(read_cache) = get_file_state(file.state).read_cache.as_mut() {
    read_cache.retain(|key, _| ...);
}

2. No tests for the read-cache truncation paths (Low)

The read cache is opt-in (RIVETKIT_SQLITE_NATIVE_READ_CACHE=1) and off by default, so both bugs are only observable when it is enabled. There are no new tests that exercise a truncate-then-read sequence with the cache enabled. A test that:

  1. Enables the read cache
  2. Writes data spanning multiple chunks
  3. Truncates to a non-chunk-aligned size
  4. Reads back the truncated region

...would prevent regressions for both bugs, especially given the WASM/native parity requirement.

3. Boundary-chunk path re-fetches from KV even when cache is warm (Nit, pre-existing)

When the truncation target is not chunk-aligned, the code issues a fresh kv_get for the boundary chunk even when the read cache is enabled and may already hold it. Not introduced by this PR, but worth a follow-up.


WASM Parity

The WASM VFS does not have a read cache, so the cache-eviction fix has no WASM counterpart. The boundary-chunk kv_put in the WASM path also does not update any cache, which is consistent. CLAUDE.md's native/WASM parity requirement is satisfied.


Summary

Severity Issue
Low Cache eviction runs before the failable kv_put; on failure the cache is not restored (no corruption, just a stale miss). Move eviction to after the successful write.
Low No tests covering the read_cache truncation paths.
Nit Boundary-chunk path re-fetches from KV even when the read cache is warm (pre-existing).

The core logic of both fixes is correct. The main concern is the ordering of cache eviction relative to the failable kv_put. Since the read cache is disabled by default, the practical impact is minimal, but fixing the ordering would make the error-handling contract consistent across the function.

@NathanFlurry NathanFlurry marked this pull request as ready for review April 13, 2026 05:18
@NathanFlurry NathanFlurry force-pushed the 04-12-fix_sqlite-native_keep_truncate_cache_coherent branch from 4079bac to 459dd12 Compare April 13, 2026 05:38
@NathanFlurry NathanFlurry force-pushed the 04-12-fix_sqlite-vfs_use_delete_range_for_truncate_cleanup branch from 4303e83 to 60882a2 Compare April 13, 2026 05:38
@NathanFlurry NathanFlurry force-pushed the 04-12-fix_sqlite-native_keep_truncate_cache_coherent branch from 459dd12 to d7b2c43 Compare April 13, 2026 05:50
@NathanFlurry NathanFlurry force-pushed the 04-12-fix_sqlite-vfs_use_delete_range_for_truncate_cleanup branch from 5f174d0 to c75356d Compare April 13, 2026 07:03
@NathanFlurry NathanFlurry force-pushed the 04-12-fix_sqlite-native_keep_truncate_cache_coherent branch from d7b2c43 to 6c17fae Compare April 13, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant