Add in tests for Colocated joins with table hints#17846
Add in tests for Colocated joins with table hints#17846deepthi912 wants to merge 5 commits intoapache:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #17846 +/- ##
============================================
- Coverage 63.27% 63.25% -0.02%
- Complexity 1466 1481 +15
============================================
Files 3190 3190
Lines 192101 192287 +186
Branches 29433 29471 +38
============================================
+ Hits 121547 121627 +80
- Misses 61040 61131 +91
- Partials 9514 9529 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
...ion-tests/src/test/java/org/apache/pinot/integration/tests/ColocatedJoinIntegrationTest.java
Outdated
Show resolved
Hide resolved
...ion-tests/src/test/java/org/apache/pinot/integration/tests/ColocatedJoinIntegrationTest.java
Outdated
Show resolved
Hide resolved
yashmayya
left a comment
There was a problem hiding this comment.
I've added a number of suggestions here, please take a look.
| /** Three-table join: result correctness with colocated hint. */ | ||
| @Test | ||
| public void testThreeTableJoinResultCorrectness() | ||
| throws Exception { | ||
| String tableOpt = getTableOptPerTableHint(); | ||
| String sql = "SELECT COUNT(*) FROM userAttributes /*+ " + tableOpt + " */ ua " | ||
| + "JOIN userGroups /*+ " + tableOpt + " */ ug ON ua.userUUID = ug.userUUID " | ||
| + "JOIN userFactEvents /*+ " + tableOpt + " */ ue ON ua.userUUID = ue.userUUID"; | ||
| JsonNode result = postQuery(sql); | ||
| assertNoExceptions(result); | ||
| long count = getCountFromResult(result); | ||
| assertTrue(count >= 1, "Three-table join should return at least one row"); | ||
| } |
There was a problem hiding this comment.
How is this test validating join result correctness?
There was a problem hiding this comment.
IMO the result should be manually validated and exact counts should be asserted instead.
There was a problem hiding this comment.
Alternatively, we can also validate against results using non colocated join queries.
| startZk(); | ||
| startController(); | ||
| startBrokers(1); | ||
| startServers(1); |
There was a problem hiding this comment.
How are we even testing colocated joins if there's a single server setup?
| } | ||
| } | ||
|
|
||
| /** Two-table join with mismatched partition_size in hints: planner rejects with partition size mismatch error |
There was a problem hiding this comment.
Maybe add one for mismatched partition function too?
| /** | ||
| * Base integration tests for colocated joins tests(multi-stage engine). | ||
| */ | ||
| public abstract class ColocatedJoinIntegrationTestBase extends BaseClusterIntegrationTestSet { |
There was a problem hiding this comment.
I think we don't have any coverage for composite join keys (multi column)?
There was a problem hiding this comment.
Another case is when table is partitioned on one column but we're using a different column as the join key. We need positive / negative tests for this.
| /** | ||
| * Base integration tests for colocated joins tests(multi-stage engine). | ||
| */ | ||
| public abstract class ColocatedJoinIntegrationTestBase extends BaseClusterIntegrationTestSet { |
There was a problem hiding this comment.
What happens if there are equi + non-equi join conditions with colocated join hints?
|
|
||
| /** Two-table join with joinOptions colocated but only one table has tableOptions hint (partial config). */ | ||
| @Test | ||
| public void testTwoTableJoinWithPartialTableOptionsResultCorrectness() |
| /** Two-table join with joinOptions(is_colocated_by_join_keys='null'): explicit 'null' disables colocated exchange | ||
| * (plan does not use [PARTITIONED]); query still succeeds with correct result. */ | ||
| @Test | ||
| public void testTwoTableJoinWithJoinOptionsColocatedKeyExplicitNullResultCorrectness() |
There was a problem hiding this comment.
What is the point of this test?
| * assertion {@code [PARTITIONED]}). Adds third table (userFactEvents) and three-table colocated | ||
| * join tests. | ||
| */ | ||
| public class ColocatedJoinMultiPartitionIntegrationTest extends ColocatedJoinIntegrationTestBase { |
There was a problem hiding this comment.
Can we make these tests also support realtime or hybrid tables? The partitioning info is calculated differently in those cases
| * assertion {@code [PARTITIONED]}). Adds third table (userFactEvents) and three-table colocated | ||
| * join tests. | ||
| */ | ||
| public class ColocatedJoinMultiPartitionIntegrationTest extends ColocatedJoinIntegrationTestBase { |
There was a problem hiding this comment.
Can we also have some tests for partition_parallelism?
There was a problem hiding this comment.
This one supports multiple workers per partition
| assertNoExceptions(result); | ||
| String plan = extractImplementationPlan(result); | ||
| assertNotNull(plan, "implementation plan should be present"); | ||
| assertTrue(plan.contains(PLAN_PARTITIONED_MARKER), message + plan); |
There was a problem hiding this comment.
This is a very weak assertion. We should check it's on the correct stage / exchange, no unnecessary shuffles elsewhere in the plan, number of PARTITIONED markers etc. Also ideally we want to assert on the actual query stage stats too (validating that raw messages is 0 / there are only in-memory messages across the colocated stages) since this PARTITIONED marker is a weak assertion (you can trace the code path that adds this to the explain plan output).
In reference to #16603 there have been changes to the colocated joins and the query hints used there. This PR adds in integration tests that can help us test the configs through which the colocated joins can be identified and be enabled for the user so the hash join and regular data shuffles are eliminated as the data and worker operations is colocated on a single server for the same partitions of the tables that would be joined in the query.