add adaptive batch size heuristic for filtered search#309
add adaptive batch size heuristic for filtered search#309yuejiaointel wants to merge 4 commits intomainfrom
Conversation
rfsaliev
left a comment
There was a problem hiding this comment.
Thank you for the good proposal.
Requested changes:
Please apply such improvements to
range_search()and in `vamana_index_impl.h as well
Suggestions:
There are some performance related suggestions in comments.
But during the review, I found, that: compute_filtered_batch_size() logic is prediction of further amount of processing based on previous processing results and requested amount of matches aka:
PredictFurtherProcessing(processed, hits, goal)
So, I would declare this function more generic, and move it to utilities header with more common signature and reuse in vamana_index_impl.h as well:
In such case,
% max_batch_sizeoperation should be applied outside of this function
/// @param processed - number of already processed elements (total_checked)
/// @param hits - number of matched elements (found)
/// @param goal - number of requested elements to be matched (needed)
/// @param hint - result to be returned if prediction is failed, e.g. other params == 0
size_t predict_further_processing(size_t processed, size_t hits, size_t goal, size_t hint) {
if (processed * hits * goal == 0) {
return hint;
}
// use prediction formula below
...
}| @@ -136,6 +153,8 @@ class DynamicVamanaIndexImpl { | |||
| } | |||
| } | |||
| } | |||
| batch_size = | |||
| compute_filtered_batch_size(found, k, total_checked, batch_size); | |||
There was a problem hiding this comment.
Good idea, but, from performance perspective, I would slightly change the code:
- Compute the batch size at the beginning of the
do-whileloop - it will avoid computation whenfound==k - Increment
total_checkedout-of theforloop. - It might make sense to set initial batch size the
maxofkandsearch_window_size
E.g.
| size_t total_checked = 0; | |
| auto batch_size = std::max(k, sp.buffer_config_.get_search_window_size()); | |
| do { | |
| batch_size = | |
| compute_filtered_batch_size(found, k, total_checked, batch_size); | |
| iterator.next(batch_size); | |
| for (auto& neighbor : iterator.results()) { | |
| if (filter->is_member(neighbor.id())) { | |
| result.set(neighbor, i, found); | |
| found++; | |
| if (found == k) { | |
| break; | |
| } | |
| } | |
| } | |
| total_checked += iterator.size(); | |
There was a problem hiding this comment.
Thx, added these change
| double hit_rate = static_cast<double>(found) / total_checked; | ||
| return static_cast<size_t>((needed - found) / hit_rate); |
There was a problem hiding this comment.
I would also try to improve performance here:
- FP64 computation is not very performant
- Computation precision is not very important here
- There is potential issues in SVS BatchIterator in case of huge batch size
So, I would use the following formula:
hit_rate_inv = 1 / hit_rate = checked / foundresult = (needed - found) / hit_rate = (needed - found) * hit_rate_inv = needed * checked / found - checked- The formula
needed * checked / found - checkedis most precise, but there is the bigger risk of overflow for hugeneededandcheckedvalues
| double hit_rate = static_cast<double>(found) / total_checked; | |
| return static_cast<size_t>((needed - found) / hit_rate); | |
| auto hit_rate = total_checked / found + 1; // found == 0 is handled above; +1 to increase result eliminating INT precision issues | |
| return (needed - found) * hit_rate % max_batch_size; // max_batch_size - constant |
Alternative (assuming, that FP32 is fast enough):
| double hit_rate = static_cast<double>(found) / total_checked; | |
| return static_cast<size_t>((needed - found) / hit_rate); | |
| float new_batch_size = static_cast<float>(needed) * total_checked / found - total_checked; | |
| return static_cast<size_t>(new_batch_size) % max_batch_size; |
There was a problem hiding this comment.
thx added, probably need to run some benchmarks before knowing exact performance
- Rename compute_filtered_batch_size to predict_further_processing and move to svs_runtime_utils.h for reuse - Use float arithmetic instead of double for hit rate calculation - Compute batch size at loop start to avoid unnecessary computation - Use iterator.size() instead of per-element increment for total_checked - Initial batch size = max(k, search_window_size) - Apply adaptive batch size to vamana_index_impl.h filtered search
- Cap batch size with std::min instead of modulo to avoid SIGFPE - Add comments explaining adaptive batch sizing logic
Currently the filtered k-NN search loop uses batch_size = k when calling iterator.next(). When the filter is restrictive (e.g., 1% of IDs pass), this results in many expensive graph traversal rounds to collect enough valid results.
This PR introduces a heuristic that adapts the batch size based on observed filter hit rate:
For example, with k=10 and a 10% filter pass rate: instead of ~100 rounds of 10 candidates, it converges in ~2 rounds.