Support for inline-beta filtered search with expressions#782
Support for inline-beta filtered search with expressions#782
Conversation
- Refactored recall utilities in diskann-benchmark - Updated tokio utilities - Added attribute and format parser improvements in label-filter - Updated ground_truth utilities in diskann-tools
There was a problem hiding this comment.
Pull request overview
This PR integrates label-filtered (“document”) insertion and inline beta filtered search into the DiskANN benchmark/tooling flow, enabling benchmarks that operate on { vector, attributes } documents and evaluate filtered queries.
Changes:
- Added
DocumentInsertStrategyand supporting public types to insert/queryDocumentobjects (vector + attributes) throughDocumentProvider. - Extended inline beta filter search to handle predicate encoding failures and added a constructor for
InlineBetaStrategy. - Added a new benchmark input/backend (
document-index-build) plus example config for running document + filter benchmarks.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| test_data/disk_index_search/data.256.label.jsonl | Updates LFS pointer for label test data used in filter benchmarks. |
| diskann-tools/src/utils/ground_truth.rs | Adds array-aware label matching/expansion and extensive tracing diagnostics for filter ground-truth generation. |
| diskann-tools/Cargo.toml | Adds serde_json dependency (and adjusts manifest metadata). |
| diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs | Adds Vec<T> query support for full-precision in-mem provider (for inline beta usage). |
| diskann-label-filter/src/lib.rs | Exposes the new document_insert_strategy module under encoded_attribute_provider. |
| diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs | Adds InlineBetaStrategy::new and introduces is_valid_filter fast-path logic. |
| diskann-label-filter/src/inline_beta_search/encoded_document_accessor.rs | Adjusts filter encoding to be optional and threads is_valid_filter into the query computer. |
| diskann-label-filter/src/encoded_attribute_provider/roaring_attribute_store.rs | Makes RoaringAttributeStore public for cross-crate use. |
| diskann-label-filter/src/encoded_attribute_provider/encoded_filter_expr.rs | Changes encoded filter representation to Option, allowing “invalid filter” fallback behavior. |
| diskann-label-filter/src/encoded_attribute_provider/document_provider.rs | Allows vector types used in documents to be ?Sized. |
| diskann-label-filter/src/encoded_attribute_provider/document_insert_strategy.rs | New strategy wrapper enabling insertion/search over Document values. |
| diskann-label-filter/src/encoded_attribute_provider/ast_label_id_mapper.rs | Simplifies lookup error messaging and signature for attribute→id mapping. |
| diskann-label-filter/src/document.rs | Makes Document generic over ?Sized vectors. |
| diskann-benchmark/src/utils/tokio.rs | Adds a reusable multi-thread Tokio runtime builder. |
| diskann-benchmark/src/utils/recall.rs | Re-exports knn recall helper for benchmark use. |
| diskann-benchmark/src/inputs/mod.rs | Registers a new document_index input module. |
| diskann-benchmark/src/inputs/document_index.rs | New benchmark input schema for document-index build + filtered search runs. |
| diskann-benchmark/src/backend/mod.rs | Registers new document_index backend benchmarks. |
| diskann-benchmark/src/backend/index/result.rs | Extends search result reporting with query count and wall-clock summary columns. |
| diskann-benchmark/src/backend/document_index/mod.rs | New backend module entrypoint for document index benchmarks. |
| diskann-benchmark/src/backend/document_index/benchmark.rs | New end-to-end benchmark: build via DocumentInsertStrategy + filtered search via InlineBetaStrategy. |
| diskann-benchmark/example/document-filter.json | Adds example job configuration for document filter benchmark runs. |
| Cargo.lock | Adds serde_json to the lockfile dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs
Outdated
Show resolved
Hide resolved
diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs
Outdated
Show resolved
Hide resolved
diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs
Outdated
Show resolved
Hide resolved
diskann-label-filter/src/encoded_attribute_provider/encoded_filter_expr.rs
Outdated
Show resolved
Hide resolved
diskann-label-filter/src/inline_beta_search/encoded_document_accessor.rs
Outdated
Show resolved
Hide resolved
diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs
Outdated
Show resolved
Hide resolved
diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs
Outdated
Show resolved
Hide resolved
diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…DiskANN into sync-from-cdb-diskann
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #782 +/- ##
==========================================
- Coverage 89.11% 88.49% -0.62%
==========================================
Files 443 445 +2
Lines 83354 83807 +453
==========================================
- Hits 74281 74169 -112
- Misses 9073 9638 +565
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This PR has the following changes:
Add support for inline-beta search with filter expressions that support AND, OR expressions and equality comparisons.
Benchmark to evaluate perf and recall on small dataset and which also serves as an example on how to set things up to use filtered search with expressions.