feat: add CompactMask for memory-efficient crop-RLE mask storage by Borda · Pull Request #2159 · roboflow/supervision

Borda · 2026-03-02T18:48:42Z

This pull request adds support for using the new CompactMask class throughout the supervision library, enabling more efficient storage and manipulation of segmentation masks. The changes ensure that CompactMask can be seamlessly used in place of dense numpy arrays in Detections, annotators, metrics, and utility functions, while maintaining backward compatibility. Comprehensive integration tests are also added to verify correct behavior.

Core support for CompactMask in Detections and related utilities:

Detections now accepts CompactMask as the mask field, and all relevant methods (e.g., merging, area calculation, validation) are updated to handle CompactMask efficiently. This includes optimized merging and area computation without materializing dense arrays. [1] [2] [3] [4]
The annotate method in annotators (e.g., MaskAnnotator) is updated to efficiently paint masks using CompactMask crops, avoiding unnecessary memory allocation. [1] [2]

Integration with utility functions and metrics:

Utility functions such as calculate_masks_centroids and get_mask_size_category now accept CompactMask and operate efficiently on its representation. [1] [2] [3]
The move_detections function in inference_slicer adjusts CompactMask offsets directly for efficient mask translation.

Testing and validation:

A comprehensive integration test suite is added in test_compact_mask_integration.py, covering construction, filtering, merging, annotation, and equality checks for Detections with CompactMask.

General improvements and imports:

Necessary imports and type annotations are updated throughout the codebase to support CompactMask and ensure type safety. [1] [2]

These changes collectively enable efficient, flexible, and transparent use of compact mask representations across the entire supervision library.

Borda · 2026-03-02T18:49:07Z

Mask Storage Format Comparison

Assumptions

Parameter	Value	Notes
Image size	4K — 3840×2160 = 8.29 MP	Aerial / high-res camera
Avg object bounding box	80×80 px = 6,400 px²	Small object in large scene
Fill ratio within bbox	~65%	Non-trivial shapes (concavities, partial occlusion)
Avg contour vertices	~400 pts	Post-`findContours`, complex segmentation
Avg RLE runs / mask	~240 (3 runs × 80 rows)	Non-trivial = multiple spans per row
Memory bandwidth	30 GB/s	Modern CPU, cache-warm
NVMe throughput	1 GB/s	memmap read/write
Bbox overlap rate (aerial)	~1% of pairs	Sparse scene, non-overlapping objects
`cv2.findContours` (crop)	~0.2 ms/mask	Complex shape, 80×80 crop
`cv2.fillPoly` (crop)	~0.2 ms/mask	Complex shape, 80×80 crop

Encode/decode times are per-batch estimates. Single-mask overhead (Python dispatch, NumPy call)
is ignored at N=10 but becomes relevant at N=1 in tight loops.

xyxy is always present in Detections — crop bounds are free, no mask scan needed for encode.

Space

Format	Per-object	N = 10	N = 100	N = 1,000	vs Dense
Dense (current)	8.29 MB	82.9 MB	829 MB	8.3 GB	1×
Local Crop + Offset	6.4 KB	64 KB	640 KB	6.4 MB	1,300×
Crop RLE	~2 KB (240 runs × int32 pair)	20 KB	200 KB	2 MB	4,000×
Polygon ⚠ lossy	~3.2 KB (400 pts × float32 xy)	32 KB	320 KB	3.2 MB	2,600×
memmap	8.29 MB (disk)	82.9 MB	829 MB	8.3 GB	1× (disk)

Notes:

Local Crop stores the full bounding-box rectangle (including background within bbox); the ~35%
non-mask pixels are wasted. Crop RLE only encodes actual runs, so it wins for sparse objects.
Polygon vertex count grows with shape complexity; holes or disconnected contours add extra
contours. For a mask with k holes: vertices ≈ (k+1) × avg_contour_length.
memmap saves RAM by paging to disk but total data is identical to Dense.

Encode time: dense → format

Source is a single (H, W) bool array per mask. xyxy bounds are known.

Format	Complexity	N = 10	N = 100	N = 1,000
Local Crop + Offset	O(A) — strided slice from xyxy	~0.1 ms	~1 ms	~10 ms
Crop RLE	O(A) — scan crop rows for runs	~0.2 ms	~2 ms	~20 ms
Polygon	O(P) — `cv2.findContours` on crop	~2 ms	~20 ms	~200 ms
memmap	O(I) — write 8.29 MB to disk	~80 ms	~800 ms	~8,000 ms

Notes:

Local Crop reads ~307 KB per mask from the source array (80 rows × 3840-byte stride) but writes
only 6.4 KB; dominated by strided read, not write.
Polygon encode is ~20× slower than local crop due to findContours overhead even on the crop.
memmap is write-bound by NVMe; unsuitable for real-time inference pipelines.

Decode time: format → full `(H, W)` mask

Required by: MaskAnnotator, mask_iou_batch, move_masks, resize_masks, merge().
Dominant cost for all formats is allocating and zeroing a 8.29 MB array.

Format	Complexity	N = 10	N = 100	N = 1,000
Local Crop + Offset	O(I) alloc-zeros + O(A) paste	~3 ms	~30 ms	~300 ms
Crop RLE	O(I) alloc-zeros + O(A) expand + paste	~3 ms	~30 ms	~300 ms
Polygon	O(I) alloc-zeros + O(A) `fillPoly`	~5 ms	~50 ms	~500 ms
memmap	O(I) read from NVMe	~80 ms	~800 ms	~8,000 ms

Key insight: at 4K resolution, zeroing 8.29 MB per mask dominates for all in-memory formats.
Local Crop and Crop RLE are identical here — the format difference is irrelevant once you must
materialize the full image canvas.

Decode time: format → crop only (optimized path)

Possible when callers only need the object's local region, not the full image canvas.
Applicable to: contains_holes, contains_multiple_segments, filter_segments_by_distance,
area property, and — with coord-offset awareness — mask_to_polygons, MaskAnnotator crop path.

Format	Complexity	N = 10	N = 100	N = 1,000
Local Crop + Offset	O(1) — already in memory	0 ms	0 ms	0 ms
Crop RLE	O(A) — expand 240 runs	~0.02 ms	~0.2 ms	~2 ms
Polygon	O(A) — `fillPoly` on crop canvas	~2 ms	~20 ms	~200 ms
memmap	N/A — always full-size	~80 ms	~800 ms	~8,000 ms

Local Crop is uniquely suited here: the stored crop is the decompressed form. No
decode step exists. This is the primary performance argument over Crop RLE for the common case.

IoU / NMS time

Pairs to evaluate = N(N-1)/2. At 1% bbox overlap rate (aerial, sparse scene):

Format	Strategy	N = 10	N = 100	N = 1,000
Dense (current, resize to 640)	All pairs, 640² pixel AND	<1 ms	~100 ms	~10,000 ms
Local Crop + Offset	Bbox pre-filter → pixel IoU on intersection only	<1 ms	<1 ms	~5 ms
Crop RLE	Bbox pre-filter → expand intersection crops	<1 ms	~1 ms	~15 ms
Polygon	Bbox pre-filter → rasterize intersection crops	<1 ms	~10 ms	~150 ms
memmap	Same as Dense but reads from disk	<1 ms	~100 ms	~10,000 ms

Dense at N=1,000: 499,500 pairs × 409,600 (640²) pixels = 204 billion ops ≈ 10 s.
The current code mitigates this via configurable memory chunking, but the ops count is unchanged.

Local Crop with bbox pre-filter at N=1,000: ~5,000 overlapping pairs (1%) × ~400 px
intersection region each = 2M ops ≈ 2,000× faster. The 99% non-overlapping pairs cost O(1)
each (vectorized xyxy IoU check).

Other properties

Property	Dense	Local Crop + Offset	Crop RLE	Polygon	memmap
Lossless	✓	✓	✓	✗ thin structures, 1-px features	✓
NumPy drop-in	✓	✗ needs wrapper	✗ needs wrapper	✗ needs wrapper	✓
`merge()` across `Detections`	✓ same H,W	✓ with image_shape metadata	✓ same	✗ requires rasterization	✓
COCO interop	manual	manual	native	native	manual
Functions needing no change	all	`area`, `contains_holes`, `contains_multiple_segments`, `filter_segments_by_distance`	same	same	all
Functions needing updates	—	`MaskAnnotator`, `mask_iou_batch`, `move_masks`, `calculate_centroids`, `resize_masks`, `mask_to_polygons`, `merge()`	same + RLE decode	same + polygon rasterize	—
External deps	none	none	none	none (or `pycocotools` for C speed)	none
Implementation complexity	—	Low–Medium	Medium	Medium–High	Very low
Worst case	any large image	object fills image (A ≈ I)	complex texture (R ≈ A/2, checkerboard)	irregular / thin / holey shapes	I/O bottleneck
Best case	few large objects	many small objects (A ≪ I)	very sparse objects	smooth convex shapes	RAM-constrained system

Summary

Local Crop + Offset is the right choice for the stated problem (aerial imagery, many small objects).

The decisive advantages:

Memory — 1,300× smaller than dense for the stated scenario. RLE is ~3× smaller still but
only for objects that are themselves sparse; for typical solid segmentation masks the difference
is minor.
NMS/IoU — the bbox pre-filter reduces computation from O(N² × I) to O(N² + N_overlap × A),
a 2,000× speedup at N=1,000 in sparse scenes.
Crop-only decode — O(1), enabling topology analysis (contains_holes, etc.) and eventually
a crop-aware MaskAnnotator without ever allocating a full image-sized array.
Encode is free — slice from xyxy bounds already present in Detections.

Crop RLE is worth considering only if object masks are themselves internally sparse (e.g.,
mesh objects, grid patterns) or if COCO RLE interop is a hard requirement. For typical solid
instance-segmentation masks in aerial imagery it adds decode overhead with minimal space gain.

Polygon is non-starter for lossless use cases; encode cost is 20× higher than local crop,
decode requires fillPoly, and fine detail is lost.

memmap solves a different problem (RAM exhaustion via swap) without reducing data size and
adds I/O latency at every access. Not applicable here.

Copilot

Pull request overview

This PR introduces a new CompactMask representation (crop + RLE) and integrates it across supervision so instance segmentation masks can be stored/processed more memory-efficiently while remaining largely compatible with existing Detections/annotators/utilities.

Changes:

Added CompactMask implementation with RLE encode/decode, slicing, merging, and offset translation support.
Updated Detections.merge, Detections.area, validators, annotators, inference slicer movement, and mask utilities/metrics to accept and efficiently handle CompactMask.
Added unit + integration tests covering CompactMask and its interaction with Detections and annotators.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/supervision/detection/compact_mask.py`	New `CompactMask` type with crop-RLE storage and numpy interop.
`src/supervision/detection/core.py`	Enables merge/area paths to preserve and compute efficiently with `CompactMask`.
`src/supervision/validators/__init__.py`	Updates mask validation to accept `CompactMask` via a length check.
`src/supervision/annotators/core.py`	Optimizes `MaskAnnotator` to paint CompactMask crops without full-mask allocation.
`src/supervision/detection/utils/masks.py`	Extends centroid computation to support `CompactMask` without full materialization.
`src/supervision/metrics/utils/object_size.py`	Extends mask size categorization to use `CompactMask.area`.
`src/supervision/detection/tools/inference_slicer.py`	Translates `CompactMask` masks by adjusting offsets (no dense conversion).
`src/supervision/__init__.py`	Exposes `CompactMask` at package root import level.
`tests/detection/test_compact_mask.py`	New unit test suite for RLE helpers and `CompactMask` behaviors.
`tests/detection/test_compact_mask_integration.py`	New integration tests for `Detections` + `CompactMask` + annotators/merge.

src/supervision/detection/compact_mask.py

src/supervision/detection/core.py

src/supervision/detection/utils/masks.py

src/supervision/annotators/core.py

src/supervision/__init__.py

src/supervision/metrics/utils/object_size.py

codecov · 2026-03-02T18:57:25Z

Codecov Report

❌ Patch coverage is 95.17426% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 77%. Comparing base (c010656) to head (b2234da).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #2159    +/-   ##
========================================
+ Coverage       75%     77%    +2%     
========================================
  Files           62      63     +1     
  Lines         7545    7885   +340     
========================================
+ Hits          5648    6034   +386     
+ Misses        1897    1851    -46

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Dense (N, H, W) bool masks cause OOM for aerial imagery (1000 objects x 4K image ~ 8.3 GB). CompactMask encodes each mask as a run-length sequence of its bounding-box crop, reducing typical usage to ~2 MB. - New `CompactMask` class with full duck-typed ndarray interface: `__getitem__`, `__array__`, `shape`, `dtype`, `area`, `sum`, `merge`, `with_offset` — drop-in compatible with existing `np.ndarray` masks. - Private row-major RLE helpers: `_rle_encode`, `_rle_decode`, `_rle_area`. - Phase 2 integration: `Detections` accepts CompactMask for `mask` field; `validate_mask`, `area` property, and `Detections.merge` all handle it. - Phase 3 optimised paths: `calculate_masks_centroids` uses crop-space arithmetic; `MaskAnnotator` paints crop regions directly; `move_detections` uses `with_offset` instead of materialising dense masks; `get_mask_size_category` uses `mask.area`. - 54 new tests (41 unit + 13 integration); all 17 doctests pass. - All 1190 existing tests pass; pre-commit hooks clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: Codex <codex@openai.com>

…tion - Add `compact_mask_iou_batch` for optimised IoU computation on RLE crops (avoiding full (N, H, W) arrays). - Enhance `mask_iou_batch` and NMS routines to support CompactMask inputs. - Introduce `compact_masks` parameter in `InferenceSlicer` for end-to-end CompactMask handling. - Update docstrings across affected components to reflect CompactMask integration.

…nto debug/oom

…er integration - Add correctness and integration tests for `compact_mask_iou_batch`, ensuring exact match with dense IoU results across multiple cases. - Validate NMS behavior with CompactMask inputs for both isolated and overlapping masks. - Introduce end-to-end tests in `InferenceSlicer` with `compact_masks=True`, verifying pipeline consistency against dense masks.

Borda · 2026-03-11T01:05:33Z

                                                     CompactMask — benchmark summary                                                     
╭───────────────┬─────────┬──────────────┬───────┬────────────┬───────────┬───────────┬─────────┬─────────┬───────────┬──────────┬──────╮
│               │         │              │       │            │   Compact │   Compact │     Mem │    Area │    Filter │    Annot │      │
│ Scenario      │ Objects │ Resolution   │  Fill │  Dense mem │    theory │    actual │     (x) │     (x) │       (x) │      (x) │ OK?  │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ FHD-100-5%    │     100 │ 1920x1080    │    5% │   207.4 MB │     33 KB │     63 KB │   6327x │  259.5x │    475.3x │    70.8x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ FHD-100-10%   │     100 │ 1920x1080    │   10% │   207.4 MB │     45 KB │     87 KB │   4564x │  275.1x │    424.6x │    48.6x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ FHD-100-20%   │     100 │ 1920x1080    │   20% │   207.4 MB │     67 KB │    137 KB │   3108x │  265.6x │    417.6x │    28.6x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-500-5%     │     500 │ 3840x2160    │    5% │  4147.2 MB │    139 KB │    250 KB │  29860x │ 1169.0x │   3559.4x │   421.3x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-500-10%    │     500 │ 3840x2160    │   10% │  4147.2 MB │    193 KB │    304 KB │  21489x │ 1125.8x │   1940.7x │   259.1x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-500-20%    │     500 │ 3840x2160    │   20% │  4147.2 MB │    284 KB │    403 KB │  14616x │ 1094.7x │   3711.7x │   134.4x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-1000-5%    │    1000 │ 3840x2160    │    5% │  8294.4 MB │    189 KB │    411 KB │  43919x │ 1145.9x │   6516.5x │   686.7x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-1000-10%   │    1000 │ 3840x2160    │   10% │  8294.4 MB │    277 KB │    498 KB │  29994x │ 1126.7x │   5935.1x │   437.5x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-1000-20%   │    1000 │ 3840x2160    │   20% │  8294.4 MB │    384 KB │    605 KB │  21606x │ 1114.7x │   5224.1x │   275.2x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ SAT-200-5%    │     200 │ 8192x8192    │    5% │ 13421.8 MB │    271 KB │    485 KB │  49454x │ 8378.0x │  18414.0x │   164.4x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ SAT-200-10%   │     200 │ 8192x8192    │   10% │ 13421.8 MB │    388 KB │    813 KB │  34574x │ 8567.8x │  12891.4x │   100.6x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ SAT-200-20%   │     200 │ 8192x8192    │   20% │ 13421.8 MB │    559 KB │   1418 KB │  23992x │ 7874.6x │  15730.2x │    59.6x │  ✓   │
╰───────────────┴─────────┴──────────────┴───────┴────────────┴───────────┴───────────┴─────────┴─────────┴───────────┴──────────┴──────╯

·  Compact theor. — sum of internal numpy buffer sizes
·  Compact actual — tracemalloc peak during .from_dense() (w/ Python overhead)
·  Mem x — dense / compact theoretical ratio
·  Area x — .area speedup (RLE sum, no materialisation)
·  Filter x — boolean-index speedup
·  Annot x — MaskAnnotator speedup (crop-paint vs full-frame alloc)
·  italic ms — dense skipped (array > 16 GB), compact absolute time shown

Adds examples/compact_mask/ with a standalone benchmark that demonstrates CompactMask as a drop-in replacement for dense (N,H,W) bool mask arrays, covering FHD / 4K / satellite (8192×8192) tiers at 5, 10, and 20 % fill. Benchmark highlights: - tracemalloc-based real memory measurement alongside theoretical nbytes - DENSE_SKIP_GB threshold (12 GB) prevents swap thrashing on SAT scenarios - LRU-cached synthetic mask generation (ellipses via cv2.ellipse) - Staged design: stage_build / stage_area / stage_filter / stage_annotate / stage_correctness for clear separation of concerns - Rich summary table with Compact theor. vs Compact actual columns - All non-skipped scenarios verified: pixel-perfect annotation, exact area, lossless to_dense() roundtrip README covers motivation, theoretical space/encode/decode/IoU analysis from the PR design doc, drop-in API examples, and known limitations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

You can also share your feedback on Copilot code review. Take the survey.

src/supervision/detection/utils/masks.py

src/supervision/detection/compact_mask.py

src/supervision/detection/utils/iou_and_nms.py

src/supervision/detection/core.py

src/supervision/detection/utils/iou_and_nms.py

- Add 5 new benchmark stages: iou, nms, merge, offset, centroids - Add tracemalloc measurement for dense masks (theory vs malloc split) - Add per-scenario JSONL result persistence (nan → null, timestamped) - Add parallel timing via ThreadPoolExecutor (REPETITIONS=6, PARALLEL=3) - Add gc.collect() before each timing rep and between scenarios - Remove functools.cache from make_detections (caused 150 GB RAM usage) - Colour-code speedup ratios: green ≥10x, yellow 1-10x, red <1x - Rename theor. → theory in table headers; add att./op. type labels - Fix stage_offset broadcast error by expanding canvas by offset amount - Fix correctness display with proper f-string concatenation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…, and segments - Add NMM tests for CompactMask, ensuring numerical consistency with dense input. - Add `calculate_masks_centroids` tests, validating exact results across both paths. - Add `contains_holes` and `contains_multiple_segments` tests, verifying behavior after encode-decode roundtrip. - Refactor indexing logic in CompactMask for performance and maintainability. - Simplify CompactMask concatenation by removing redundant `.astype()` calls.

CompactMask.repack(): re-encodes each mask crop using tight bounding boxes, eliminating background padding from loose detector bboxes. O(sum of crop areas); useful as a one-time cleanup after accumulating many InferenceSlicer tile merges. Detections.is_empty() fast path: avoids calling __eq__ which materialised the full (N, H, W) CompactMask array just to check emptiness — turning an O(N·H·W) check into O(1). This was the root cause of the 0.56x merge regression. CompactMask.merge() now uses list.extend (C-level) instead of a flat list comprehension, reducing Python bytecode overhead under GIL contention. benchmark: pre-compute half-splits outside the timed lambda so stage_merge measures only the concatenation, not the slicing. New tests: repack() (4 cases), NMM parity (TestNmmWithCompactMask), centroids parity (TestCalculateMasksCentroidsCompact), contains_holes and contains_multiple_segments roundtrip parity after CompactMask encode/decode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Update `calculate_masks_centroids` to assign centroids of (0, 0) for all-zero tight crops, avoiding division by zero and ensuring consistency with dense implementation. - Refine indexing logic in `CompactMask` to support Python `list[bool]` as a mask selector. - Add tests for empty masks and boolean list indexing to ensure correctness and parity across scenarios.

…nto debug/oom

…ions - Introduce `bbox_xyxy` property to compute inclusive bounding boxes for masks, enabling better metadata access and usability. - Refine type annotations for variables like `centroids`, `flat`, and `result` to ensure clarity and type safety.

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

You can also share your feedback on Copilot code review. Take the survey.

src/supervision/detection/tools/inference_slicer.py

src/supervision/detection/compact_mask.py

src/supervision/detection/utils/masks.py

src/supervision/detection/utils/iou_and_nms.py

src/supervision/annotators/core.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Refactor `with_offset` to clip partially or fully out-of-frame masks, ensuring they remain valid and consistent with `move_masks` behavior. - Add iterator support to `CompactMask` for generating dense boolean arrays. - Update `InferenceSlicer` to handle `CompactMask` offsets without dense materialization. - Introduce extensive tests to validate clipping behavior and parity with `move_masks`.

Remove hard line-wraps from all prose paragraphs — lines now flow as single lines. Add "Operation-by-Operation Speedup Analysis" section covering Memory, .area, filter/__getitem__, annotate, IoU, NMS, merge, with_offset, and centroids with numbered compounding-factor tables and expected speedups for each. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When no crop overflows the new canvas — the common case in InferenceSlicer where the canvas is expanded by the tile offset — with_offset() now runs in O(N): one numpy broadcast to add (dx, dy) to the offsets array, a vectorised bounds check, and a shared-RLE return. No RLE data is decoded or re-encoded. Only masks that genuinely straddle the image boundary go through the slow decode+clip+re-encode path. This brings with_offset from 0.67x (slower than dense) to >1 000x faster in the no-clip case. Update examples/compact_mask/README.md to reflect the new fast path description and summary table speedup (~40x → >1 000x). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…erence Compact NMS uses exact full-res crop IoU while dense NMS downsamples to 640px first. Borderline pairs near the 0.5 threshold can flip between the two paths — this is a quality improvement in compact, not a bug. Changes: - stage_nms now returns a 4-tuple (dense_s, compact_s, nms_ok, n_diff) - nms_ok is strict (n_diff == 0) — no silent tolerance - nms_mismatch_count field added to ScenarioResult for JSON logging - Correctness display shows nms=✗(N) with the exact count so it's clear how many decisions differ and whether it's a rounding artefact (1-3) or a real bug (many more) - stage_nms docstring explains the resize-vs-exact quality difference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Five new test classes covering the full CompactMask surface against dense ground truth, each parametrised over 10 seeds (seeds 0-9) with varying object counts (N=1,5,20,50) and image sizes (50x50 to 1080x1920): - TestCompactMaskRoundtripRandom — from_dense→to_dense pixel equality, shape/len, and per-index access - TestCompactMaskAreaRandom — .area and .sum(axis=(1,2)) match dense - TestCompactMaskFilterRandom — boolean and integer-list filter parity - TestCompactMaskWithOffsetRandom — with_offset matches move_masks for random offsets including partial and full out-of-frame cases - TestCompactMaskIouRandom — compact_mask_iou_batch matches dense mask_iou_batch; self-IoU diagonal is 1.0; tight-bbox parity All 198 tests pass in <1 s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Across compact_mask.py, iou_and_nms.py, and both test files: - n → num_masks (or num_pixels in _rle_decode) - h, w → img_h, img_w - i (loop) → mask_idx - i, j (iou pair loop) → idx_a, idx_b - i (chunked loop) → chunk_start - i (nms loop) → row_idx - m (merge loop) → cm - r (area comprehension) → rle - a, b (mask arrays in tests) → masks_a, masks_b - k (selected count) → num_selected - g, d (jaccard loop) → gt_box, det_box Coordinate shorthands (x1, y1, dx, dy, ix1, iy1, etc.) left unchanged as they are standard and unambiguous in geometric contexts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ssion Dense masks were resized to 640 px before IoU computation, while CompactMask used exact full-resolution crop IoU. For borderline pairs whose true IoU is close to the threshold, the downscaling flipped keep/suppress decisions. Fix: call mask_iou_batch directly on full-resolution masks for both paths. mask_dimension parameter kept for backward compatibility but is now a no-op. Add regression test at 1920x1080 with a borderline pair near IoU=0.5 to prevent recurrence. Existing tests used ≤40x40 images where resize upscaled (no information loss), so the lossy code path was never exercised. Also revise benchmark parameter matrix: FHD-200/400, 4K-100, SAT-200 tiers; fill fractions [0.05, 0.20, 0.50] to match real supervision/SAM-2 use cases; IOU_DENSE_SKIP_GB=1.0 so IoU+NMS dense timing is only run for sub-1 GB tiers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- NMS section: remove resize_masks/640px approximation (bug was fixed — both paths now call mask_iou_batch directly with exact IoU) - Operating point: replace nonexistent 4K-500-5% with FHD-200-50%-v600 as the primary reference scenario throughout all analysis sections - Per-operation speedups: cite measured values from new benchmark run (.area 176x, filter 467x, annotate 26x, iou 464x, nms 109x, merge 908x, offset 2214x, centroids 19x at FHD-200-50%-v600; SAT-200 extremes: merge 272709x, offset 183199x) - Tier table: 3 tiers → 6 tiers (FHD-100/200/400, 4K-100/200, SAT-200); fill fractions 5/10/20% → 5/20/50% (sparse/moderate/SAM-everything) - Sample results table: 5 rows → 8 representative rows covering full range; add Area/Filter/Annot/IoU/NMS/Merge/Offset speedup columns; update skip threshold footnote (IOU_DENSE_SKIP_GB=1.0, not 12 GB) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 2, 2026 18:48

Borda requested a review from SkalskiP as a code owner March 2, 2026 18:48

Borda added the enhancement New feature or request label Mar 2, 2026

Copilot started reviewing on behalf of Borda March 2, 2026 18:49 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

Borda force-pushed the debug/oom branch from bf38ee5 to d67563e Compare March 2, 2026 18:57

Borda and others added 5 commits March 2, 2026 20:38

fix: resolve unresolved PR 2159 review suggestions

77a0117

Co-authored-by: Codex <codex@openai.com>

fix: correct bounding box coordinates in CompactMask doctests

ad6ceb7

Merge branch 'develop' into debug/oom

a417abc

Merge branch 'debug/oom' of https://github.com/roboflow/supervision i…

d41fcf7

…nto debug/oom

Borda self-assigned this Mar 10, 2026

Borda and others added 3 commits March 10, 2026 10:40

Merge branch 'develop' into debug/oom

91742a2

fix(pre_commit): 🎨 auto format pre-commit hooks

490cc0a

github-actions bot added the has conflicts label Mar 10, 2026

Merge branch 'develop' into debug/oom

75ed494

github-actions bot removed the has conflicts label Mar 10, 2026

fix(pre_commit): 🎨 auto format pre-commit hooks

1f74014

Borda force-pushed the debug/oom branch from 0f7911d to 944cf6f Compare March 11, 2026 01:00

Borda force-pushed the debug/oom branch from 5e9461e to eff23f6 Compare March 11, 2026 06:46

Merge branch 'develop' into debug/oom

66917c2

Borda requested a review from Copilot March 11, 2026 06:47

Copilot started reviewing on behalf of Borda March 11, 2026 06:47 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Borda and others added 4 commits March 11, 2026 15:33

fix(pre_commit): 🎨 auto format pre-commit hooks

b058368

Borda force-pushed the debug/oom branch from 8e74335 to b058368 Compare March 11, 2026 17:58

Borda and others added 3 commits March 11, 2026 19:06

Apply suggestions from code review

3784110

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Merge branch 'debug/oom' of https://github.com/roboflow/supervision i…

3b76858

…nto debug/oom

Borda requested a review from Copilot March 11, 2026 18:11

Copilot started reviewing on behalf of Borda March 11, 2026 18:12 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Borda and others added 9 commits March 11, 2026 19:19

Apply suggestions from code review

2968028

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Conversation

Borda commented Mar 2, 2026

Uh oh!

Borda commented Mar 2, 2026

Mask Storage Format Comparison

Assumptions

Space

Encode time: dense → format

Decode time: format → full (H, W) mask

Decode time: format → crop only (optimized path)

IoU / NMS time

Other properties

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Borda commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Decode time: format → full `(H, W)` mask

codecov bot commented Mar 2, 2026 •

edited

Loading

Borda commented Mar 11, 2026 •

edited

Loading