Epic: CometNativeScan improvements (per-partition serde, cleanup, DPP, AQE DPP, V2 operator)

I opened #3446 initially to apply a bunch of changes from #3349 to CometNativeScan. The PR got away from me though, particularly due to DPP. I had it working for the static DPP rules, but the dynamic ones that come from AQE are tricky. I am opening this issue to track the things I'd like to tackle in smaller chunks rather than one giant PR:
- [x] Per-partition serde to no longer send every `SparkFilePartition` of tasks to every partition. This simply reduces serde overhead for large scans. Addressed by #3511.
- [x] DPP (non-AQE) for V1 operator. These runtime filters are created by Spark's `PlanDynamicPruningFilters` and are easier for Comet support since this rule runs before Comet's rules. Addressed by #4011.
- [ ] DPP (AQE) for V1 operator. These runtime filters are created by Spark's `PlanAdaptiveDynamicPruningFilters` and are difficult for Comet support since this rule runs after Comet's rules. I'll summarize my learning from #3446:
  - Comet's rules replace things like `BroadcastHashJoin` with `CometBroadcastHashJoin`, which `PlanAdaptiveDynamicPruningFilters` does not recognize.
  - We can't modify Spark rules, so we could wait until after `PlanAdaptiveDynamicPruningFilters` runs. This requires registering new Comet rules after where they currently run. I tried to create a simple rule to defer just `BroadcastHashJoin` replacement until later, but this became too complicated with multiple scan implementations. I think when we pare down our scan implementations, we can revisit a broader redesign of Comet rules in a way that works better with AQE. We will need this for stronger Spark 4.0 support.
- [ ] CometNativeBatchScan operator. See #3481. Will not support DPP, so pretty quick to implement.
- [ ] Minor refactor tasks: consider renaming these (CometNativeScan, CometNativeBatchScan) since we now have other scan types. Don't wrap them in `CometScan` or `CometBatchScan`. This might allow us to collapse our CometScanRule and CometExecRule. Part of the larger rule refactoring discussed above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: CometNativeScan improvements (per-partition serde, cleanup, DPP, AQE DPP, V2 operator) #3510

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Epic: CometNativeScan improvements (per-partition serde, cleanup, DPP, AQE DPP, V2 operator) #3510

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions