Skip to content

feat: substrait builder extra api#773

Open
mbwhite wants to merge 3 commits intosubstrait-io:mainfrom
mbwhite:additional-substrait-builder
Open

feat: substrait builder extra api#773
mbwhite wants to merge 3 commits intosubstrait-io:mainfrom
mbwhite:additional-substrait-builder

Conversation

@mbwhite
Copy link
Contributor

@mbwhite mbwhite commented Mar 23, 2026

Follow on to pr #767 to fill some missing gaps in the substrait builder. We've recently used the Substrait Builder and found that there some variants and missing gaps in the APIs. We'd subclassed it to add these, so I'm now wanting to push these to the main codebase.

There wasn't a unit test already so I've added a basic one here - which I appreciate will is quite large.

@github-actions
Copy link

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@mbwhite mbwhite force-pushed the additional-substrait-builder branch from bd62615 to d0cc543 Compare March 23, 2026 11:58
@mbwhite mbwhite changed the title Additional substrait builder feat: substrait builder extra api Mar 23, 2026
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from d0cc543 to 6c079f0 Compare March 23, 2026 12:50
Signed-off-by: MBWhite <whitemat@uk.ibm.com>
@mbwhite mbwhite force-pushed the additional-substrait-builder branch 2 times, most recently from 8618365 to 434df46 Compare March 24, 2026 10:29
@mbwhite mbwhite marked this pull request as ready for review March 24, 2026 10:34
Signed-off-by: MBWhite <whitemat@uk.ibm.com>
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from 434df46 to 83fa865 Compare March 24, 2026 11:50
*/
public Expression.I32Literal i32(int v) {
return Expression.I32Literal.builder().value(v).build();
public Expression.I32Literal i32(final int value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to alter all of the signatures in this PR as final? It's adding a lot of diff to this PR and IMO it doesn't really add anything extra in terms of guarantees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding final is something my editor is set to do, but happy to work with the project guidelines;

i32 and fp64 are the only two existing methods I've altered. Everything else is new (or newly public)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'm on my phone so can't look at the rest of the file, but can we only use final for new methods if it's consistent with the other methods then? Do the other ones mark every method as final?

@mbwhite mbwhite force-pushed the additional-substrait-builder branch 2 times, most recently from 56aaa2a to 5bdace0 Compare March 24, 2026 13:09
Signed-off-by: MBWhite <whitemat@uk.ibm.com>
@mbwhite mbwhite force-pushed the additional-substrait-builder branch from 5bdace0 to ef5e526 Compare March 24, 2026 13:10
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. Thanks!

* @param value value to create
* @return i16 instance
*/
public Expression.I8Literal i8(final int value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public Expression.I8Literal i8(final int value) {
public Expression.I8Literal i8(int value) {

And yeah looking at these files on my computer now. It does look like we consistently don't use final in all of the other methods. Can we drop the usage of final in all of these places to keep our API consistent? That way we can have a separate conversation about introducing final if we want it.

#629 (comment)

* @param input the input relation to aggregate
* @return an Aggregate relation representing the grouping and aggregation operation
*/
public Aggregate aggregate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm a bit confused here. Now that this method is made public, what is the difference between this one and the above one? Just that Rel.Remap is not required, but is instead optional?

Can we instead just either a) have one aggregate builder in which the Remap is marked optional, or b) have two different impls, one in which the Remap is required, and the other in which there is no Remap argument?

Otherwise, its a bit unclear to me which one users are supposed to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my preference for now is just the one singular one where it is marked optional. This would technically be a breaking change but would be a minimal diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants