Skip to content

[EPIC] Improve Comet planning #4005

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Comet currently eagerly tries to convert each operator to a Comet operator. This sometimes leads to inefficient plans that can be slower than just falling back to Spark.

One example is where a Comet JVM shuffle is inserted for a child plan that runs in Spark, to convert to columner shuffle, and then the next stage converts back to row-based right away.

+- HashAggregate
  +- CometNativeColumnarToRow
     +- CometColumnarExchange
        +- HashAggregate

Another example we saw recently was with DPP fallback, and there was a fix applied for that specific case.

I think it is time to start looking at the overall planning strategy.

Document to discuss:

https://docs.google.com/document/d/1ux_SwZPd64VTtuTC9lC46oIZGKcoH6KhWOS8xlqwdx8/edit?usp=sharing

Tracking for related issues:

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions