Conversation
- Simplify Search trait: move processor/output buffer to method-level generics - Remove Internal<FullPrecision> strategy split; use RemoveDeletedIdsAndCopy for delete ops - Add DefaultSearchStrategy aggregate trait combining SearchStrategy + HasDefaultProcessor - Update benchmark-core helpers to use aggregate trait (reduce recurring bounds) - Wire range search output buffer through to caller (support dynamic output handling) - Add no-op SearchOutputBuffer impl for () type to preserve compatibility
…roviders This commit moves the determinant_diversity_post_process module from diskann to diskann-providers, as it does not depend on diskann internals and logically belongs with other post-processing logic in the providers layer. Changes: - Move determinant_diversity_post_process.rs from diskann/src/graph/search/ to diskann-providers/src/model/graph/provider/async_/ - Update all imports across workspace to use diskann_providers location - Add diskann-providers dependency to diskann-benchmark-core (required for DeterminantDiversitySearchParams access) - Remove old module reference from diskann/src/graph/search/mod.rs - Update diskann-benchmark, diskann-disk imports to use new location Validated with: - cargo clippy --workspace --all-targets -- -D warnings - cargo fmt --all This results in cleaner architectural separation where determinant-diversity search parameters stay with the provider infrastructure that implements them.
…uce clones - Add DeterminantDiversityError enum for parameter validation - Convert DeterminantDiversitySearchParams::new() to return Result<Self, Error> - Validate top_k > 0, eta >= 0.0 and finite, power > 0.0 and finite - Optimize post_process_with_eta_f32: precompute projections to eliminate vector clones - Optimize post_process_greedy_orthogonalization_f32: single r_star_copy before projection loop - Expand test suite from 3 to 11 tests (7 validation + 4 algorithm tests) - Update callsites in disk_index/search.rs and index/search/knn.rs for error handling - Add early validation checks in main router function
- Extract shared run-loop logic into reusable helpers - Route both knn and determinant-diversity through closure-based parameter builders - Preserve determinant-diversity parameter validation/error propagation - Reduce duplicated benchmark orchestration code
- Promote DelegateDefaultPostProcessor as the canonical trait in glue - Remove compatibility alias layer for HasDefaultProcessor - Rename all trait bounds/impls/usages across diskann, providers, disk, benchmark, and label-filter - Keep delegate_default_post_process! macro usage aligned with trait naming
- Add runtime filter_start_points flag to RemoveDeletedIdsAndCopy and Rerank - Route default search through runtime-configurable processors (no FilterStartPoints pipeline) - Set inplace-delete search processors to filter_start_points=false - Remove Internal<T> strategy indirection and update async providers accordingly
- Preserve inner search_post_processor for Cached<S> inplace-delete path - Add CachedPostProcess wrapper to avoid PostProcess impl overlap - Keep default post-processing delegation unchanged for normal search
The post-processing refactor should not touch diskann-quantization. Restores license headers on generated flatbuffer files and reverts a lazy_format! call-site change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Determinant diversity post-processing should be landed separately. This removes: - determinant_diversity_post_process.rs from diskann-providers - determinant_diversity.rs from diskann-benchmark-core - All diversity-related PostProcess impls (full_precision, disk_provider) - Diversity benchmark infrastructure (run_determinant_diversity, DeterminantDiversityKnn trait, search_determinant_diversity) - Diversity input parsing and validation from async_ and disk inputs - Diversity example JSON files Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the PostProcess trait indirection with HRTB bounds directly on search methods. The key insight is that SearchAccessor<'a> has no 'where Self: 'a' clause, making the HRTB 'for<'a> PP: SearchPostProcess< S::SearchAccessor<'a>, T, O>' safe even with generic S. Changes in diskann/: - Remove PostProcess trait, DelegateDefaultPostProcessor trait, DefaultPostProcess ZST, blanket impl, delegate_default_post_process! macro - Add HasDefaultProcessor trait and has_default_processor! macro - Update DefaultSearchStrategy = SearchStrategy + HasDefaultProcessor - Update InplaceDeleteStrategy: SearchPostProcessor now carries the HRTB SearchPostProcess bound directly - Search trait now requires S: SearchStrategy (needed for GAT projection) - All Search impls (Knn, RecordedKnn, Range, Diverse, Multihop) call processor.post_process() directly instead of strategy.post_process_with() - DiskANNIndex::search() uses HasDefaultProcessor::create_processor() - DiskANNIndex::search_with() takes PP with HRTB bound - Update test provider to use HasDefaultProcessor + CopyIds Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Delete all forwarding PostProcess impl blocks (10 impls), rename DelegateDefaultPostProcessor to HasDefaultProcessor across all provider delete CachedPostProcess<P> newtype (replaced by Pipeline<Unwrap, S::SearchPostProcessor>), and replace longhand SearchStrategy + HasDefaultProcessor with DefaultSearchStrategy where appropriate. Net: -383 lines of pure forwarding boilerplate eliminated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert 3 SearchPostProcess implementations from manual async desugaring (fn -> impl Future + Send with async move block) to native async fn. The recursive test_spawning in provider.rs is kept manual because it needs an explicit 'static bound on the returned future. Added T: Sync bound to RemoveDeletedIdsAndCopy impl because async fn captures all parameters (including unused &T) in the future, requiring &T: Send → T: Sync. This is always satisfied at call sites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove leftover SearchOutputBuffer, IntoANNResult, and Neighbor imports in debug_provider.rs that were only used by the deleted PostProcess impls. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore main's structural approach: post-processors (RemoveDeletedIdsAndCopy, Rerank) are clean ZSTs that only handle deletion filtering and reranking. Start-point filtering is composed via Pipeline<FilterStartPoints, Base> at the type level: - HasDefaultProcessor returns Pipeline<FilterStartPoints, Base> (filters start points during regular search) - InplaceDeleteStrategy returns Base directly (no start-point filtering during delete operations) This eliminates the runtime 'filter_start_points: bool' flag, makes the post-processors synchronous again (no .await needed), and restores their Error types to Infallible/Panics instead of ANNError. Also reverts diskann-benchmark/src/backend/index/search/knn.rs to main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the unnecessary search_core helper that the PR introduced. Inline the search body back into Knn::search, matching main's structure. The only semantic difference from main is the Option A change: processor is now a method parameter instead of coming from strategy.post_processor(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #828 +/- ##
==========================================
+ Coverage 89.28% 89.33% +0.05%
==========================================
Files 442 443 +1
Lines 83009 83484 +475
==========================================
+ Hits 74111 74579 +468
- Misses 8898 8905 +7
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR refactors DiskANN’s search pipeline to decouple post-processing from SearchStrategy, allowing callers to pass an explicit post-processor (and output buffer) per search call. This enables custom reranking/filtering/ID translation without creating new strategy types, and updates providers/benchmarks accordingly.
Changes:
- Introduces
glue::DefaultPostProcessor(+default_post_processor!macro) and removesPostProcessorfromSearchStrategy. - Updates the
Searchtrait andDiskANNIndexto accept an explicit post-processor (search_with) while keeping a conveniencesearch()for default processors. - Reworks range search to write directly into a
SearchOutputBuffer(addsVec<Neighbor<_>>buffer support) and removes the old range-search output struct.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/neighbor/mod.rs | Adds SearchOutputBuffer impl for Vec<Neighbor<_>> + tests. |
| diskann/src/graph/test/provider.rs | Updates test strategy to DefaultPostProcessor + new inplace-delete post-processor wiring. |
| diskann/src/graph/search/range_search.rs | Range search now writes through SearchOutputBuffer with inline distance filtering wrapper + tests. |
| diskann/src/graph/search/multihop_search.rs | Plumbs explicit post-processor/output-buffer generics into multihop search. |
| diskann/src/graph/search/mod.rs | Updates Search trait signature and public re-exports; adjusts docs. |
| diskann/src/graph/search/knn_search.rs | Plumbs explicit post-processor/output-buffer generics into k-NN and recorded k-NN. |
| diskann/src/graph/search/diverse_search.rs | Plumbs explicit post-processor/output-buffer generics into diverse search. |
| diskann/src/graph/mod.rs | Updates re-exports to drop removed RangeSearchOutput. |
| diskann/src/graph/index.rs | Adds search_with, makes search() require DefaultPostProcessor, updates call sites. |
| diskann/src/graph/glue.rs | Removes PostProcessor assoc type; adds DefaultPostProcessor/DefaultSearchStrategy + macro; updates inplace delete trait. |
| diskann-providers/src/model/graph/provider/layers/betafilter.rs | Delegates DefaultPostProcessor through wrapper pipeline. |
| diskann-providers/src/model/graph/provider/async_/inmem/test.rs | Updates test strategy to DefaultPostProcessor. |
| diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs | Moves default post-processing to DefaultPostProcessor impls. |
| diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs | Moves default post-processing to DefaultPostProcessor impls. |
| diskann-providers/src/model/graph/provider/async_/inmem/product.rs | Removes Internal workaround; updates default + delete post-processing wiring. |
| diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs | Removes Internal workaround; updates default + delete post-processing wiring. |
| diskann-providers/src/model/graph/provider/async_/debug_provider.rs | Removes Internal workaround; updates default + delete post-processing wiring. |
| diskann-providers/src/model/graph/provider/async_/common.rs | Removes now-unneeded Internal<T> wrapper type. |
| diskann-providers/src/model/graph/provider/async_/caching/provider.rs | Rebuilds default/delete post-processing delegation with new traits and accessor constraints. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs | Removes Internal workaround; updates default + delete post-processing wiring. |
| diskann-providers/src/index/wrapped_async.rs | Updates bounds to DefaultSearchStrategy for calling index.search(). |
| diskann-providers/src/index/diskann_async.rs | Updates test helpers and range-search tests to use output buffers (Vec of neighbors). |
| diskann-label-filter/src/inline_beta_search/inline_beta_filter.rs | Delegates DefaultPostProcessor through inline beta filtering wrapper. |
| diskann-garnet/src/provider.rs | Removes Internal strategy usage; updates default + delete post-processing wiring. |
| diskann-disk/src/search/provider/disk_provider.rs | Implements DefaultPostProcessor for disk search strategy; updates output-buffer bounds. |
| diskann-benchmark/src/backend/index/benchmarks.rs | Updates benchmark strategy bounds to DefaultSearchStrategy. |
| diskann-benchmark/src/backend/disk_index/build.rs | Adjusts metadata accessors (ndims()/npoints()). |
| diskann-benchmark-core/src/search/graph/range.rs | Range benchmark now collects results via the passed output buffer. |
| diskann-benchmark-core/src/search/graph/multihop.rs | Updates benchmark strategy bounds to DefaultSearchStrategy. |
| diskann-benchmark-core/src/search/graph/knn.rs | Updates benchmark strategy bounds to DefaultSearchStrategy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
Please remember to create a new version number before merging since search API is revised |
| @@ -810,10 +784,6 @@ impl<T: VectorRepr> SearchStrategy<GarnetProvider<T>, [T], u32> for FullPrecisio | |||
| ) -> Result<Self::SearchAccessor<'a>, Self::SearchAccessorError> { | |||
There was a problem hiding this comment.
@metajack - I removed the "default" post-processing definition from the FullPrecision search strategy targeting internal IDs. As far as I could tell, it isn't used now that InplaceDeleteStrategy provides its own post-processor. Happy to add it back though you want!
| result_ids.as_mut_slice(), | ||
| result_dists.as_mut_slice(), | ||
| ); | ||
| // Post-process results directly into the output buffer, filtering by radius. |
There was a problem hiding this comment.
@hailangx - Can you help me double-check the change in logic here? The big difference is that results are written directly into a SearchOutputBuffer (this is possible now that the interface has been relaxed into an "append-only" interface - meaning that dynamically sized buffers can be a thing).
One thing I'm a little worried about is that it's still possible to call range-search with a fixed size buffer, and I'm wondering if we should invest in generic machinery to make that impossible. One advantage of still allowing a fixed-size buffer is that it provides a natural way of limiting the number of results - but may be non-obvious to callers. What do you think?
|
Decouple post-processing from
SearchStrategyso that search methods accept an explicit post-processor parameter. This enables callers to supply custom post-processing logic (reranking, filtering, ID translation) without defining new strategy types.Search trait changes
The
Searchtrait now requiresS: SearchStrategy<DP, T>at the trait level, and introduces two new generic parameters on thesearchmethod:O- the output type written to the buffer (used to be a parameter ofSearchStrategy)PP— the post-processor, bounded byfor<'a> SearchPostProcess<S::SearchAccessor<'a>, T, O> + Send + SyncOB— the output buffer, bounded bySearchOutputBuffer<O> + Send + ?SizedPreviously,
OBwas a trait-level generic and the post-processor was obtained internally from the strategy. Now both are method-level generics, giving callers control over each independently.The output "id" type
Ohas been removed fromSearchStrategyentirely, which greatly simplifies things.At the
DiskANNIndexlevel, two entry points are provided:search()— requiresS: DefaultPostProcessor, creates the strategy's default processor automatically, then delegates tosearch_with(). This is the common path.search_with()— requires onlyS: SearchStrategy, accepts any compatiblePP. This is the extension point for custom post-processing.DefaultPostProcessor
Strategies opt in to default post-processing by implementing
DefaultPostProcessor:The
default_post_processor!macro covers the common case where the processor isDefault-constructible.DefaultSearchStrategyis a convenience alias forSearchStrategy + DefaultPostProcessor.Start-point filtering
The in-mem code used to rely on a special
Internaltype to customize post-processing for the inplace delete pipeline.Now that post-processing is customizable, this workaround is no longer needed.
Start-point filtering is composed at the type level using the existing
Pipelinemechanism:HasDefaultProcessor::Processor = Pipeline<FilterStartPoints, RemoveDeletedIdsAndCopy>— filters start points during regular searchInplaceDeleteStrategy::SearchPostProcessor = RemoveDeletedIdsAndCopy— no start-point filtering during delete re-pruningRange search
To respond to moving the output buffer to a method level geneirc, range search now writes results directly through a
DistanceFilteredoutput buffer wrapper that applies radius and inner-radius filtering inline during post-processing. This replaces the previous approach of allocating intermediateVecs and copying results through anIdDistancebuffer.SearchOutputBufferis now implemented forVec<Neighbor<I>>, providing an unbounded growable buffer suitable for range search and other cases where the result count is not known in advance.The
RangeSearchResultstype has been removed now that aSearchOutputBufferis properly used.Inplace Delete
Since the post-processor is now a little decoupled from
SearchStrategy, rustc is unable to fully resolve the existing bounds for the "caching" provider. Try as I might, I could not get anything to work without adding aDeleteSearchAccessorassociated type toInplaceDeleteStrategy. At the end of the day, I think this is a compromise we'll have to live with.Migration guide
If you implement
SearchStrategy— minimal changes required. Remove theOparfameter (if needed). ImplementDefaultPostProcessorwith your current post-processor if you want your strategy to work withindex.search(). Use thedefault_post_processor!macro for simple cases:If you call
index.search()— no API change for strategies that implementDefaultPostProcessor(all built-in strategies do).If you need custom post-processing — use
index.search_with()and pass your post-processor directly:If you implement
Search— the method signature has changed.PPandOBare now method-level generics, andprocessoris an explicit parameter:If you use range search results —
RangeSearchOutputno longer carriesidsanddistancesfields. Results are written to the output buffer passed tosearch(). UseVec<Neighbor<u32>>(or anySearchOutputBufferimpl) to collect them.If you implement
InplaceDeleteStrategy: Add theDeleteSearchAccessorassociated type, ensuring it is the same as the accessor selected by theSearchStrategy. This is redundant information, but needed to help constrain downstream trait bounds.