Skip to content

[harness] apache/datafusion-python#1484: feat: add AI skill to find and improve the Pythonic interface to functions#6

Closed
sszz01 wants to merge 11 commits intoharness-base-3585c11eedfrom
harness-replay-1484-1776570560
Closed

[harness] apache/datafusion-python#1484: feat: add AI skill to find and improve the Pythonic interface to functions#6
sszz01 wants to merge 11 commits intoharness-base-3585c11eedfrom
harness-replay-1484-1776570560

Conversation

@sszz01
Copy link
Copy Markdown
Owner

@sszz01 sszz01 commented Apr 19, 2026

Automated replay of apache#1484

Base sha: 3585c11eed778810e3317c56c2c25a8cdc29be5b
Head sha: e36eff29a6afb65df4aa2cf5a700f71887f2ca1b

This PR exists only to exercise the deployed LogoMesh App. Do not merge.

timsaucer and others added 11 commits April 9, 2026 11:44
…uiring lit()

Update 47 functions in functions.py to accept native Python types (int, float,
str) for arguments that are contextually literals, eliminating verbose lit()
wrapping. For example, users can now write split_part(col("a"), ",", 2) instead
of split_part(col("a"), lit(","), lit(2)). All changes are backward compatible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ions

Update instr and position (aliases of strpos) to accept Expr | str for
the substring parameter, matching the updated primary function signature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Alias functions that delegate to a primary function must have their type
hints updated to match, even though coercion logic is only added to the
primary. Added a new Step 3 to the implementation workflow for this.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update SKILL.md to prevent three classes of issues: clarify that float
already accepts int per PEP 484 (avoiding redundant int | float that
fails ruff PYI041), add backward-compat rule for Category B so existing
Expr params aren't removed, and add guidance for inline coercion with
many optional nullable params instead of local helpers.

Replace regexp_instr's _to_raw() helper with inline coercion matching
the pattern used throughout the rest of the file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…erns

Introduce coerce_to_expr() and coerce_to_expr_or_none() in expr.py as the
complement to ensure_expr() — where ensure_expr rejects non-Expr values,
these helpers wrap them via Expr.literal(). Replaces ~60 inline isinstance
checks in functions.py with single-line helper calls, and updates the
make-pythonic skill to document the new pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Technique 1a to detect literal-only arguments in aggregate functions.
Unlike scalar UDFs which enforce literals in invoke_with_args(), aggregate
functions enforce them in accumulator() via get_scalar_value(),
validate_percentile_expr(), or downcast_ref::<Literal>(). Without this
technique, the skill would incorrectly classify arguments like
approx_percentile_cont's percentile as Category A (Expr | float) when they
should be Category B (float only). Updates the decision flow to branch on
scalar vs aggregate before checking for literal enforcement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Technique 1b to detect literal-only arguments in window functions.
Window functions enforce literals in partition_evaluator() via
get_scalar_value_from_args() / downcast_ref::<Literal>(), not in
invoke_with_args() (scalar) or accumulator() (aggregate). Updates the
decision flow to branch on scalar vs aggregate vs window.

Known window functions with literal-only arguments: ntile (n), lead/lag
(offset, default_value), nth_value (n).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 7 fragile truthiness checks (x.expr if x else None) with
explicit is not None checks to prevent silent None when zero-valued
literals are passed. Widen log/power/pow type hints to Expr | int | float
with noqa: PYI041 for clarity. Add unit tests for coerce_to_expr helpers
and integration tests for pythonic calling conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add FBT003 (boolean positional value) to the per-file-ignores for
python/tests/* in pyproject.toml, and remove the 6 now-redundant
inline noqa: FBT003 comments across test_expr.py and test_context.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded "Known aggregate/window functions with literal-only
arguments" lists with instructions to discover them dynamically by
searching the upstream crate source. Keeps a few examples as validation
anchors so the agent knows its search is working correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PyThreadState_SetAsyncExc only delivers exceptions when the thread is
executing Python bytecode, not while in native (Rust/C) code. The
previous test had two issues causing flakiness on Python 3.11:

1. The interrupt fired before df.collect() entered the UDF, while the
   thread was still in native code where async exceptions are ignored.
2. time.sleep(2.0) is a single C call where async exceptions are not
   checked — they're only checked between bytecode instructions.

Fix by adding a threading.Event so the interrupt waits until the UDF is
actually executing Python code, and by sleeping in small increments so
the eval loop has opportunities to check for pending exceptions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sszz01 sszz01 deleted the branch harness-base-3585c11eed April 19, 2026 05:08
@sszz01 sszz01 closed this Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants