Skip to content

[common] Fix O(2^n) complexity in FileIndexPredicate.getRequiredNames#7332

Open
dubin555 wants to merge 2 commits intoapache:masterfrom
dubin555:oss-scout/verify-fix-fileindex-predicate-exponential-complexity
Open

[common] Fix O(2^n) complexity in FileIndexPredicate.getRequiredNames#7332
dubin555 wants to merge 2 commits intoapache:masterfrom
dubin555:oss-scout/verify-fix-fileindex-predicate-exponential-complexity

Conversation

@dubin555
Copy link

@dubin555 dubin555 commented Mar 2, 2026

Purpose

Linked issue: close #7230

FileIndexPredicate.getRequiredNames() calls child.visit(this) twice per child in its CompoundPredicate visitor — once discarding the result, then again to collect it. Since PredicateBuilder.or() produces right-nested binary trees via reduce(), this doubles work at each tree level, resulting in O(2^n) time complexity.

For an IN clause with 20 values (which produces a nested OR tree of depth 19), this means ~1,048,576 leaf visits instead of 20. In production, queries with moderately sized IN clauses hang indefinitely.

The fix removes the redundant child.visit(this) call (line 130), matching the correct pattern already used in PredicateVisitor.FieldNameCollector.

The bug was introduced in ebdfa02bd ("[hotfix] Correct visitors for TransformPredicate"), which refactored the visitor to handle TransformPredicate and accidentally left the duplicate call.

Tests

  • FileIndexPredicateTest.testGetRequiredNamesLinearComplexity() — builds a 20-element OR chain, counts leaf visits via AtomicInteger. Asserts exactly 20 visits (linear). Before fix: 1,048,575 visits (exponential).
  • FileIndexPredicateTest.testGetRequiredNamesPerformance() — builds a 20-element OR chain, asserts completion within 100ms.
  • FileIndexPredicateTest.testGetRequiredNamesBasic() — verifies correctness: all field names are collected from a compound predicate.
  • FileIndexPredicateTest.testGetRequiredNamesSinglePredicate() — verifies single leaf predicate returns the correct field name.

API and Format

No.

Documentation

No.

Generative AI tooling

Generated-by: Claude Code 1.0.33

Remove redundant child.visit(this) call in getRequiredNames() that caused
exponential time complexity for deeply nested OR predicates (e.g. IN clauses).
The visitor called child.visit(this) twice per child — once discarding the
result, then again using it — doubling work at each tree level.

For IN clauses with <= 20 values producing right-nested OR trees of depth N,
this caused O(2^N) leaf visits instead of O(N), hanging production CPUs.

Closes apache#7230
* fix, this would hang due to O(2^n) complexity.
*/
@Test
public void testGetRequiredNamesPerformance() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test be in the paimon-micro-benchmark project to avoid slowing down regular builds?

The testGetRequiredNamesPerformance test with System.nanoTime()
assertions could be flaky on slower CI machines. The remaining
testGetRequiredNamesLinearComplexity test already verifies the fix
deterministically by asserting exact visit counts (20 vs 2^20-1).
@dubin555
Copy link
Author

Good point! The testGetRequiredNamesPerformance test with System.nanoTime() assertions is indeed fragile and could be flaky on slower CI machines.

I've removed it — the testGetRequiredNamesLinearComplexity test already verifies the fix by asserting the exact visit count (20 leaf visits instead of 2^20-1), which is a deterministic correctness check rather than a timing-based one. That should be sufficient as a regression test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] FileIndexPredicate getRequiredNames() redundant child.visit() causing Exponential algorithmic complexity

2 participants