I opened #3446 initially to apply a bunch of changes from #3349 to CometNativeScan. The PR got away from me though, particularly due to DPP. I had it working for the static DPP rules, but the dynamic ones that come from AQE are tricky. I am opening this issue to track the things I'd like to tackle in smaller chunks rather than one giant PR:
I opened #3446 initially to apply a bunch of changes from #3349 to CometNativeScan. The PR got away from me though, particularly due to DPP. I had it working for the static DPP rules, but the dynamic ones that come from AQE are tricky. I am opening this issue to track the things I'd like to tackle in smaller chunks rather than one giant PR:
SparkFilePartitionof tasks to every partition. This simply reduces serde overhead for large scans. Addressed by feat: CometNativeScan per-partition plan serde #3511.PlanDynamicPruningFiltersand are easier for Comet support since this rule runs before Comet's rules. Addressed by feat: non-AQE DPP for native Parquet scans, broadcast exchange reuse for DPP subqueries #4011.PlanAdaptiveDynamicPruningFiltersand are difficult for Comet support since this rule runs after Comet's rules. I'll summarize my learning from feat: CometNativeScan per-partition plan data, add DPP [iceberg] #3446:BroadcastHashJoinwithCometBroadcastHashJoin, whichPlanAdaptiveDynamicPruningFiltersdoes not recognize.PlanAdaptiveDynamicPruningFiltersruns. This requires registering new Comet rules after where they currently run. I tried to create a simple rule to defer justBroadcastHashJoinreplacement until later, but this became too complicated with multiple scan implementations. I think when we pare down our scan implementations, we can revisit a broader redesign of Comet rules in a way that works better with AQE. We will need this for stronger Spark 4.0 support.CometScanorCometBatchScan. This might allow us to collapse our CometScanRule and CometExecRule. Part of the larger rule refactoring discussed above.