perf: Element-wise comparison only for tolerance-requiring data types by MariusMerkleQC · Pull Request #26 · Quantco/diffly

Marius Merkle (MariusMerkleQC) · 2026-03-27T22:44:19Z

Motivation

Changes

Introduce a function _needs_element_wise_comparison that checks whether element-wise comparison needs to be performed; this is the case for

(1) float vs numeric columns -> absolute and relative tolerances apply (-> _is_float_numeric_pair())
(2) temporal columns -> absolute temporal tolerance applies (-> _is_temporal_pair())

In all other cases, naive comparison suffices, and this shortcut is taken if the above helper returns False. This avoids the expensive _compare_sequence_columns(). The performance improvement can be seen in the benchmark test.

codecov · 2026-03-27T22:45:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (ef9aa25) to head (9fd6079).

Additional details and impacted files

@@             Coverage Diff             @@
##           benchmark       #26   +/-   ##
===========================================
  Coverage     100.00%   100.00%           
===========================================
  Files             10        10           
  Lines            758       780   +22     
===========================================
+ Hits             758       780   +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR optimizes condition_equal_columns for nested list/array columns by avoiding the expensive element-wise comparison path when tolerances/special handling aren’t needed, and updates the performance benchmark accordingly.

Changes:

Add _needs_element_wise_comparison() (plus helpers) to decide when list/array columns require element-wise comparison.
Shortcut list/array comparisons to eq_missing() when element-wise handling is deemed unnecessary.
Update the performance test to assert comparable performance for list<i64> comparisons.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`diffly/_conditions.py`	Introduces dtype-based gating to skip element-wise list/array comparisons unless tolerances/special handling are needed.
`tests/test_performance.py`	Updates benchmark expectations to ensure the optimized path is not significantly slower than direct `eq_missing()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

diffly/_conditions.py

Marius Merkle (MariusMerkleQC) · 2026-03-28T07:11:12Z

diffly/_conditions.py

            # Otherwise, we simply compare all fields independently
            return pl.all_horizontal(
                [
                    _compare_columns(
                        col_left=col_left.struct[field],
                        col_right=col_right.struct[field],
                        dtype_left=fields_left[field],
                        dtype_right=fields_right[field],
                        max_list_length=max_list_length,
                        abs_tol=abs_tol,
                        rel_tol=rel_tol,
                        abs_tol_temporal=abs_tol_temporal,
                    )
                    for field in fields_left
                ]
            )


Should we also shortcut the comparison here?

# Otherwise, we simply compare all fields independently if _needs_element_wise_comparison(dtype_left, dtype_right): return pl.all_horizontal( [ _compare_columns( col_left=col_left.struct[field], col_right=col_right.struct[field], dtype_left=fields_left[field], dtype_right=fields_right[field], max_list_length=max_list_length, abs_tol=abs_tol, rel_tol=rel_tol, abs_tol_temporal=abs_tol_temporal, ) for field in fields_left ] ) return col_left.eq_missing(col_right)

Ok, I added a performance test in #25, which already passes without the above optimization, so we should definitely not add it.

Marius Merkle (MariusMerkleQC) added 2 commits March 27, 2026 23:43

perf: Element-wise comparison only for tolerance-requiring data types

0618ad2

perf: Element-wise comparison only for tolerance-requiring data types

3e0d66d

Marius Merkle (MariusMerkleQC) self-assigned this Mar 27, 2026

github-actions bot added the performance label Mar 27, 2026

Marius Merkle (MariusMerkleQC) changed the base branch from main to benchmark March 27, 2026 22:44

Marius Merkle (MariusMerkleQC) mentioned this pull request Mar 27, 2026

feat: Tolerances for inner lists and arrays #21

Merged

Marius Merkle (MariusMerkleQC) mentioned this pull request Mar 27, 2026

test: Benchmark slowdown of element-wise list comparison #25

Open

Marius Merkle (MariusMerkleQC) added 2 commits March 28, 2026 00:00

Merge branch 'benchmark' into optimize

a4f0225

merge base

77cd4da

Marius Merkle (MariusMerkleQC) requested a review from Copilot March 27, 2026 23:02

Copilot started reviewing on behalf of Marius Merkle (MariusMerkleQC) March 27, 2026 23:03 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

diffly/_conditions.py Show resolved Hide resolved

diffly/_conditions.py Show resolved Hide resolved

diffly/_conditions.py Show resolved Hide resolved

Marius Merkle (MariusMerkleQC) added 2 commits March 28, 2026 00:30

feedback copilot

9815b75

feedback copilot

83e79b4

Marius Merkle (MariusMerkleQC) marked this pull request as ready for review March 27, 2026 23:45

Marius Merkle (MariusMerkleQC) requested review from EgeKaraismailogluQC and Oliver Borchert (borchero) as code owners March 27, 2026 23:45

Marius Merkle (MariusMerkleQC) commented Mar 28, 2026

View reviewed changes

Marius Merkle (MariusMerkleQC) and others added 2 commits March 28, 2026 08:16

add test

55766c4

Merge branch 'benchmark' into optimize

9fd6079

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Element-wise comparison only for tolerance-requiring data types#26

perf: Element-wise comparison only for tolerance-requiring data types#26
Marius Merkle (MariusMerkleQC) wants to merge 8 commits intobenchmarkfrom
optimize

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 28, 2026

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

codecov bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Marius Merkle (MariusMerkleQC) Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Marius Merkle (MariusMerkleQC) commented Mar 27, 2026 •

edited

Loading

codecov bot commented Mar 27, 2026 •

edited

Loading