Skip to content

Support for inline-beta filtered search with expressions#782

Open
gopalrs wants to merge 34 commits intomainfrom
sync-from-cdb-diskann
Open

Support for inline-beta filtered search with expressions#782
gopalrs wants to merge 34 commits intomainfrom
sync-from-cdb-diskann

Conversation

@gopalrs
Copy link
Contributor

@gopalrs gopalrs commented Feb 16, 2026

This PR has the following changes:

  • Add support for inline-beta search with filter expressions that support AND, OR expressions and equality comparisons.

  • Benchmark to evaluate perf and recall on small dataset and which also serves as an example on how to set things up to use filtered search with expressions.

- Refactored recall utilities in diskann-benchmark
- Updated tokio utilities
- Added attribute and format parser improvements in label-filter
- Updated ground_truth utilities in diskann-tools
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates label-filtered (“document”) insertion and inline beta filtered search into the DiskANN benchmark/tooling flow, enabling benchmarks that operate on { vector, attributes } documents and evaluate filtered queries.

Changes:

  • Added DocumentInsertStrategy and supporting public types to insert/query Document objects (vector + attributes) through DocumentProvider.
  • Extended inline beta filter search to handle predicate encoding failures and added a constructor for InlineBetaStrategy.
  • Added a new benchmark input/backend (document-index-build) plus example config for running document + filter benchmarks.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
test_data/disk_index_search/data.256.label.jsonl Updates LFS pointer for label test data used in filter benchmarks.
diskann-tools/src/utils/ground_truth.rs Adds array-aware label matching/expansion and extensive tracing diagnostics for filter ground-truth generation.
diskann-tools/Cargo.toml Adds serde_json dependency (and adjusts manifest metadata).
diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs Adds Vec<T> query support for full-precision in-mem provider (for inline beta usage).
diskann-label-filter/src/lib.rs Exposes the new document_insert_strategy module under encoded_attribute_provider.
diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs Adds InlineBetaStrategy::new and introduces is_valid_filter fast-path logic.
diskann-label-filter/src/inline_beta_search/encoded_document_accessor.rs Adjusts filter encoding to be optional and threads is_valid_filter into the query computer.
diskann-label-filter/src/encoded_attribute_provider/roaring_attribute_store.rs Makes RoaringAttributeStore public for cross-crate use.
diskann-label-filter/src/encoded_attribute_provider/encoded_filter_expr.rs Changes encoded filter representation to Option, allowing “invalid filter” fallback behavior.
diskann-label-filter/src/encoded_attribute_provider/document_provider.rs Allows vector types used in documents to be ?Sized.
diskann-label-filter/src/encoded_attribute_provider/document_insert_strategy.rs New strategy wrapper enabling insertion/search over Document values.
diskann-label-filter/src/encoded_attribute_provider/ast_label_id_mapper.rs Simplifies lookup error messaging and signature for attribute→id mapping.
diskann-label-filter/src/document.rs Makes Document generic over ?Sized vectors.
diskann-benchmark/src/utils/tokio.rs Adds a reusable multi-thread Tokio runtime builder.
diskann-benchmark/src/utils/recall.rs Re-exports knn recall helper for benchmark use.
diskann-benchmark/src/inputs/mod.rs Registers a new document_index input module.
diskann-benchmark/src/inputs/document_index.rs New benchmark input schema for document-index build + filtered search runs.
diskann-benchmark/src/backend/mod.rs Registers new document_index backend benchmarks.
diskann-benchmark/src/backend/index/result.rs Extends search result reporting with query count and wall-clock summary columns.
diskann-benchmark/src/backend/document_index/mod.rs New backend module entrypoint for document index benchmarks.
diskann-benchmark/src/backend/document_index/benchmark.rs New end-to-end benchmark: build via DocumentInsertStrategy + filtered search via InlineBetaStrategy.
diskann-benchmark/example/document-filter.json Adds example job configuration for document filter benchmark runs.
Cargo.lock Adds serde_json to the lockfile dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sampathrg sampathrg requested a review from hildebrandmw March 16, 2026 10:33
@codecov-commenter
Copy link

codecov-commenter commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 9.09091% with 750 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.49%. Comparing base (1b6ab6b) to head (a313013).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...-benchmark/src/backend/document_index/benchmark.rs 1.71% 516 Missing ⚠️
...ded_attribute_provider/document_insert_strategy.rs 0.00% 92 Missing ⚠️
diskann-benchmark/src/inputs/document_index.rs 10.00% 81 Missing ⚠️
diskann-tools/src/utils/ground_truth.rs 56.17% 39 Missing ⚠️
...ilter/src/inline_beta_search/inline_beta_filter.rs 0.00% 11 Missing ⚠️
...rc/inline_beta_search/encoded_document_accessor.rs 0.00% 4 Missing ⚠️
...oded_attribute_provider/roaring_attribute_store.rs 0.00% 3 Missing ⚠️
diskann-label-filter/src/query.rs 0.00% 3 Missing ⚠️
.../encoded_attribute_provider/encoded_filter_expr.rs 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #782      +/-   ##
==========================================
- Coverage   89.11%   88.49%   -0.62%     
==========================================
  Files         443      445       +2     
  Lines       83354    83807     +453     
==========================================
- Hits        74281    74169     -112     
- Misses       9073     9638     +565     
Flag Coverage Δ
miri 88.49% <9.09%> (-0.62%) ⬇️
unittests 88.34% <9.09%> (-0.62%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/backend/mod.rs 100.00% <100.00%> (ø)
diskann-benchmark/src/inputs/mod.rs 83.33% <100.00%> (+0.57%) ⬆️
diskann-benchmark/src/utils/recall.rs 62.50% <ø> (ø)
diskann-label-filter/src/document.rs 0.00% <ø> (ø)
.../encoded_attribute_provider/ast_label_id_mapper.rs 97.44% <100.00%> (-0.03%) ⬇️
...rc/encoded_attribute_provider/document_provider.rs 0.00% <ø> (ø)
.../encoded_attribute_provider/encoded_filter_expr.rs 0.00% <0.00%> (ø)
...oded_attribute_provider/roaring_attribute_store.rs 72.94% <0.00%> (-2.67%) ⬇️
diskann-label-filter/src/query.rs 0.00% <0.00%> (ø)
...rc/inline_beta_search/encoded_document_accessor.rs 0.00% <0.00%> (ø)
... and 5 more

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sampathrg sampathrg changed the title Integrating in-mem, inline, beta search into GH DiskANN Support for inline-beta filtered search with expressions Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants