feat: add min_value parameter for anomaly detection tests#979
feat: add min_value parameter for anomaly detection tests#979caterinaandreu wants to merge 10 commits intoelementary-data:masterfrom
Conversation
Add an optional min_value parameter that provides an absolute floor for anomaly detection. When set, anomalies are only flagged if metric_value >= min_value. This prevents false positives on small, practically insignificant values (e.g. 0.5% nulls flagged as anomalous when the training mean is near zero). Supported across all anomaly test types: column_anomalies, all_columns_anomalies, table_anomalies, and dimension_anomalies. Ref: elementary-data/elementary#2177 Made-with: Cursor
Add min_value to the configuration precedence test to verify it is correctly resolved from vars, model config, and test config levels. Add a parametrized behavioral test that validates min_value suppresses anomaly detection when the metric value falls below the threshold. Ref: elementary-data/elementary#2177 Made-with: Cursor
|
👋 @caterinaandreu |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds an optional numeric Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner as Test Runner
participant DBT as DBT macro
participant Config as elementary.get_anomalies_test_configuration
participant DB as Database
TestRunner->>DBT: invoke anomaly test macro (may include `min_value`)
DBT->>Config: call get_anomalies_test_configuration(min_value=...)
Config->>DB: generate anomaly SQL (includes min_value_condition)
DB-->>TestRunner: returns results (anomaly flagged only if metric_value >= min_value)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
get_test_argument uses `{% if value %}` which treats 0 as falsy,
silently dropping min_value=0. Guard with an explicit `is none` check
so that 0 is preserved as a valid threshold value.
Ref: elementary-data/elementary#2177
Made-with: Cursor
arbiv
left a comment
There was a problem hiding this comment.
Thanks for the contribution! Looks great.
Made-with: Cursor
The `is none` / `is number` guards broke vars and model config resolution because Jinja Undefined is neither none nor a number. Use the same direct get_test_argument pattern as all other parameters. The min_value=0 falsy edge case is a pre-existing get_test_argument limitation shared by other params like fail_on_zero. Made-with: Cursor
Shorten the test function and case names to avoid exceeding the 63-character table name limit in PostgreSQL during dbt seed. Revert the min_value is-none guard in get_anomalies_test_configuration to use get_test_argument directly, matching all other parameters. Made-with: Cursor
The truthy check `{% if value %}` silently drops valid falsy values
like 0 (for min_value) and false (for fail_on_zero). Replace with
`{% if value is defined and value is not none %}` so that 0, false,
and other falsy-but-valid values are correctly returned, while
Undefined and None still fall through to model/vars config lookup.
Made-with: Cursor
…from_training The get_test_argument fix (is defined and is not none) correctly preserves explicit falsy values, but test macros with false defaults would short-circuit model/var config inheritance. Change defaults to none so "not set" is distinguishable from "explicitly false". No fallback-to-false needed since downstream code already treats none as falsy (same as false). Made-with: Cursor
|
Hi @arbiv, it looks like the test is failing due to this connectivity error: |
Summary
Adds an optional
min_valueparameter to all anomaly detection tests (column_anomalies,all_columns_anomalies,table_anomalies,dimension_anomalies). When set, an anomaly is only flagged ifmetric_value >= min_value, providing an absolute floor that prevents false positives on small but statistically significant values.Motivation: When monitoring metrics like
null_percent, z-score-based detection can trigger on tiny absolute changes (e.g., 0% → 0.5% nulls) that are statistically anomalous but practically insignificant. The existingignore_small_changesparameter uses relative thresholds that break down when the training mean is near zero.min_valuegives users a simple, business-meaningful absolute threshold.Closes elementary-data/elementary#2177
Changes
test_column_anomalies.sql,test_all_columns_anomalies.sql,test_table_anomalies.sql,test_dimension_anomalies.sql— addedmin_valueto macro signatures and forwarded to configuration.get_anomalies_test_configuration.sql— resolvesmin_valueviaget_test_argument(supports vars/model/test precedence) and stores it intest_configuration.get_anomaly_query.sql— newmin_value_conditionmacro integrated intoanomaly_score_condition, inside the z-score branch (sofail_on_zerostill works independently).test_anomaly_test_configuration.py— addedmin_valueto configuration precedence test.test_column_anomalies.py— added parametrized behavioral test validating anomaly suppression whenmetric_value < min_value.Usage example
Test plan
test_anomaly_test_configuration) validatesmin_valueis resolved correctly from vars, model config, and test configtest_column_anomalies_with_min_value) validates anomaly is detected withoutmin_valueand suppressed withmin_valueabove the metric valueMade with Cursor
Summary by CodeRabbit
New Features
Tests
Bug Fixes