Skip to content

feat: add min_value parameter for anomaly detection tests#979

Open
caterinaandreu wants to merge 10 commits intoelementary-data:masterfrom
caterinaandreu:feature/add-min-value-parameter
Open

feat: add min_value parameter for anomaly detection tests#979
caterinaandreu wants to merge 10 commits intoelementary-data:masterfrom
caterinaandreu:feature/add-min-value-parameter

Conversation

@caterinaandreu
Copy link
Copy Markdown

@caterinaandreu caterinaandreu commented Apr 8, 2026

Summary

Adds an optional min_value parameter to all anomaly detection tests (column_anomalies, all_columns_anomalies, table_anomalies, dimension_anomalies). When set, an anomaly is only flagged if metric_value >= min_value, providing an absolute floor that prevents false positives on small but statistically significant values.

Motivation: When monitoring metrics like null_percent, z-score-based detection can trigger on tiny absolute changes (e.g., 0% → 0.5% nulls) that are statistically anomalous but practically insignificant. The existing ignore_small_changes parameter uses relative thresholds that break down when the training mean is near zero. min_value gives users a simple, business-meaningful absolute threshold.

Closes elementary-data/elementary#2177

Changes

  • test_column_anomalies.sql, test_all_columns_anomalies.sql, test_table_anomalies.sql, test_dimension_anomalies.sql — added min_value to macro signatures and forwarded to configuration.
  • get_anomalies_test_configuration.sql — resolves min_value via get_test_argument (supports vars/model/test precedence) and stores it in test_configuration.
  • get_anomaly_query.sql — new min_value_condition macro integrated into anomaly_score_condition, inside the z-score branch (so fail_on_zero still works independently).
  • test_anomaly_test_configuration.py — added min_value to configuration precedence test.
  • test_column_anomalies.py — added parametrized behavioral test validating anomaly suppression when metric_value < min_value.

Usage example

models:
  - name: my_model
    columns:
      - name: my_column
        tests:
          - elementary.column_anomalies:
              column_anomalies:
                - null_percent
              min_value: 5  # Only flag if null_percent >= 5%

Test plan

  • Configuration precedence test (test_anomaly_test_configuration) validates min_value is resolved correctly from vars, model config, and test config
  • Behavioral test (test_column_anomalies_with_min_value) validates anomaly is detected without min_value and suppressed with min_value above the metric value
  • Run integration tests on contributor's platform

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Added a min_value parameter for anomaly detection tests to suppress anomalies when metric values are below a configured threshold.
    • Anomaly evaluation now respects the configured minimum threshold when determining anomalous metrics.
  • Tests

    • New integration tests validate that min_value is applied and changes pass/fail outcomes as expected.
  • Bug Fixes

    • Explicitly provided falsy test arguments (e.g., 0) are now accepted instead of being ignored.

Open with Devin

Add an optional min_value parameter that provides an absolute floor for
anomaly detection. When set, anomalies are only flagged if
metric_value >= min_value. This prevents false positives on small,
practically insignificant values (e.g. 0.5% nulls flagged as anomalous
when the training mean is near zero).

Supported across all anomaly test types: column_anomalies,
all_columns_anomalies, table_anomalies, and dimension_anomalies.

Ref: elementary-data/elementary#2177
Made-with: Cursor
Add min_value to the configuration precedence test to verify it is
correctly resolved from vars, model config, and test config levels.

Add a parametrized behavioral test that validates min_value suppresses
anomaly detection when the metric value falls below the threshold.

Ref: elementary-data/elementary#2177
Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

👋 @caterinaandreu
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3a5b2e76-02c2-4bac-988e-939e11c5b131

📥 Commits

Reviewing files that changed from the base of the PR and between 13c36f9 and d551624.

📒 Files selected for processing (7)
  • macros/edr/tests/test_all_columns_anomalies.sql
  • macros/edr/tests/test_column_anomalies.sql
  • macros/edr/tests/test_dimension_anomalies.sql
  • macros/edr/tests/test_event_freshness_anomalies.sql
  • macros/edr/tests/test_freshness_anomalies.sql
  • macros/edr/tests/test_table_anomalies.sql
  • macros/edr/tests/test_volume_anomalies.sql
🚧 Files skipped from review as they are similar to previous changes (3)
  • macros/edr/tests/test_all_columns_anomalies.sql
  • macros/edr/tests/test_table_anomalies.sql
  • macros/edr/tests/test_column_anomalies.sql

📝 Walkthrough

Walkthrough

Adds an optional numeric min_value parameter across anomaly test macros and configuration generation, forwards it into anomaly SQL generation to gate anomalies by metric value, and updates integration tests to verify behavior when min_value is present or omitted.

Changes

Cohort / File(s) Summary
Integration tests
integration_tests/tests/test_anomaly_test_configuration.py, integration_tests/tests/test_column_anomalies.py
Added min_value to test parameter matrix and expected adapted config; added parametrized test verifying pass/fail behavior when min_value is set or omitted.
Test macro signatures
macros/edr/tests/test_all_columns_anomalies.sql, macros/edr/tests/test_column_anomalies.sql, macros/edr/tests/test_dimension_anomalies.sql, macros/edr/tests/test_table_anomalies.sql, macros/edr/tests/test_event_freshness_anomalies.sql, macros/edr/tests/test_freshness_anomalies.sql, macros/edr/tests/test_volume_anomalies.sql
Added optional min_value=none parameter to anomaly test macros and adjusted several default parameters from falsenone; forwarded min_value into elementary.get_anomalies_test_configuration(...).
Configuration generation
macros/edr/tests/test_configuration/get_anomalies_test_configuration.sql
Added min_value argument, resolved via elementary.get_test_argument("min_value", ...), and injected "min_value" into the produced test_configuration.
Anomaly query logic / utils
macros/edr/tests/test_utils/get_anomaly_query.sql
Added min_value_condition(min_value) macro and ANDed it into anomaly_score_condition(...) so anomaly predicates require metric_value >= min_value when provided.
System utility
macros/edr/system/system_utils/get_test_argument.sql
Changed early-return check to value is defined and value is not none to preserve falsy-but-valid arguments (e.g., 0).

Sequence Diagram(s)

sequenceDiagram
  participant TestRunner as Test Runner
  participant DBT as DBT macro
  participant Config as elementary.get_anomalies_test_configuration
  participant DB as Database

  TestRunner->>DBT: invoke anomaly test macro (may include `min_value`)
  DBT->>Config: call get_anomalies_test_configuration(min_value=...)
  Config->>DB: generate anomaly SQL (includes min_value_condition)
  DB-->>TestRunner: returns results (anomaly flagged only if metric_value >= min_value)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I sniffed the data, tiny blips in tow,
A gentle floor now tells me when to show.
min_value set, the small hops slip away,
True alarms remain for the bright new day. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add min_value parameter for anomaly detection tests' clearly and concisely summarizes the main change—adding a new min_value parameter across multiple anomaly test macros.
Linked Issues check ✅ Passed All requirements from issue #2177 are met: min_value parameter added to anomaly tests, supports decimal values, respects configuration precedence (vars/model/test), only flags anomalies when metric_value >= min_value, and parameter is optional.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked issue: new min_value parameter across test macros, integration into get_anomalies_test_configuration and anomaly_score_condition logic, supporting tests, and bug fix to get_test_argument for proper null handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

get_test_argument uses `{% if value %}` which treats 0 as falsy,
silently dropping min_value=0. Guard with an explicit `is none` check
so that 0 is preserved as a valid threshold value.

Ref: elementary-data/elementary#2177
Made-with: Cursor
@caterinaandreu caterinaandreu temporarily deployed to elementary_test_env April 8, 2026 10:39 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@arbiv arbiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Looks great.

Comment thread macros/edr/tests/test_table_anomalies.sql Outdated
@caterinaandreu caterinaandreu temporarily deployed to elementary_test_env April 14, 2026 08:50 — with GitHub Actions Inactive
@caterinaandreu caterinaandreu requested a review from arbiv April 14, 2026 09:02
devin-ai-integration[bot]

This comment was marked as resolved.

@caterinaandreu caterinaandreu temporarily deployed to elementary_test_env April 14, 2026 09:18 — with GitHub Actions Inactive
The `is none` / `is number` guards broke vars and model config
resolution because Jinja Undefined is neither none nor a number.
Use the same direct get_test_argument pattern as all other parameters.
The min_value=0 falsy edge case is a pre-existing get_test_argument
limitation shared by other params like fail_on_zero.

Made-with: Cursor
Shorten the test function and case names to avoid exceeding the
63-character table name limit in PostgreSQL during dbt seed.

Revert the min_value is-none guard in get_anomalies_test_configuration
to use get_test_argument directly, matching all other parameters.

Made-with: Cursor
The truthy check `{% if value %}` silently drops valid falsy values
like 0 (for min_value) and false (for fail_on_zero). Replace with
`{% if value is defined and value is not none %}` so that 0, false,
and other falsy-but-valid values are correctly returned, while
Undefined and None still fall through to model/vars config lookup.

Made-with: Cursor
@caterinaandreu caterinaandreu temporarily deployed to elementary_test_env April 14, 2026 10:33 — with GitHub Actions Inactive
devin-ai-integration[bot]

This comment was marked as resolved.

…from_training

The get_test_argument fix (is defined and is not none) correctly
preserves explicit falsy values, but test macros with false defaults
would short-circuit model/var config inheritance. Change defaults
to none so "not set" is distinguishable from "explicitly false".
No fallback-to-false needed since downstream code already treats
none as falsy (same as false).

Made-with: Cursor
@caterinaandreu caterinaandreu deployed to elementary_test_env April 14, 2026 13:07 — with GitHub Actions Active
@caterinaandreu
Copy link
Copy Markdown
Author

Hi @arbiv, it looks like the test is failing due to this connectivity error: [Microsoft][ODBC Driver 18 for SQL Server] Communication link failure (0) (SQLExecDirectW). Would it be possible to retry it? I don’t have the permissions to do so. Thanks!

@caterinaandreu
Copy link
Copy Markdown
Author

Hi @arbiv, it looks like the test is failing due to this connectivity error: [Microsoft][ODBC Driver 18 for SQL Server] Communication link failure (0) (SQLExecDirectW). Would it be possible to retry it? I don’t have the permissions to do so. Thanks!

All checks pass now! ✅ @arbiv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add min_value parameter for column anomaly detection tests

2 participants