Skip to content

feat: add CompactMask for memory-efficient crop-RLE mask storage#2159

Open
Borda wants to merge 30 commits intodevelopfrom
debug/oom
Open

feat: add CompactMask for memory-efficient crop-RLE mask storage#2159
Borda wants to merge 30 commits intodevelopfrom
debug/oom

Conversation

@Borda
Copy link
Member

@Borda Borda commented Mar 2, 2026

This pull request adds support for using the new CompactMask class throughout the supervision library, enabling more efficient storage and manipulation of segmentation masks. The changes ensure that CompactMask can be seamlessly used in place of dense numpy arrays in Detections, annotators, metrics, and utility functions, while maintaining backward compatibility. Comprehensive integration tests are also added to verify correct behavior.

Core support for CompactMask in Detections and related utilities:

  • Detections now accepts CompactMask as the mask field, and all relevant methods (e.g., merging, area calculation, validation) are updated to handle CompactMask efficiently. This includes optimized merging and area computation without materializing dense arrays. [1] [2] [3] [4]
  • The annotate method in annotators (e.g., MaskAnnotator) is updated to efficiently paint masks using CompactMask crops, avoiding unnecessary memory allocation. [1] [2]

Integration with utility functions and metrics:

  • Utility functions such as calculate_masks_centroids and get_mask_size_category now accept CompactMask and operate efficiently on its representation. [1] [2] [3]
  • The move_detections function in inference_slicer adjusts CompactMask offsets directly for efficient mask translation.

Testing and validation:

  • A comprehensive integration test suite is added in test_compact_mask_integration.py, covering construction, filtering, merging, annotation, and equality checks for Detections with CompactMask.

General improvements and imports:

  • Necessary imports and type annotations are updated throughout the codebase to support CompactMask and ensure type safety. [1] [2]

These changes collectively enable efficient, flexible, and transparent use of compact mask representations across the entire supervision library.

Copilot AI review requested due to automatic review settings March 2, 2026 18:48
@Borda Borda requested a review from SkalskiP as a code owner March 2, 2026 18:48
@Borda Borda added the enhancement New feature or request label Mar 2, 2026
@Borda
Copy link
Member Author

Borda commented Mar 2, 2026

Mask Storage Format Comparison

Assumptions

Parameter Value Notes
Image size 4K — 3840×2160 = 8.29 MP Aerial / high-res camera
Avg object bounding box 80×80 px = 6,400 px² Small object in large scene
Fill ratio within bbox ~65% Non-trivial shapes (concavities, partial occlusion)
Avg contour vertices ~400 pts Post-findContours, complex segmentation
Avg RLE runs / mask ~240 (3 runs × 80 rows) Non-trivial = multiple spans per row
Memory bandwidth 30 GB/s Modern CPU, cache-warm
NVMe throughput 1 GB/s memmap read/write
Bbox overlap rate (aerial) ~1% of pairs Sparse scene, non-overlapping objects
cv2.findContours (crop) ~0.2 ms/mask Complex shape, 80×80 crop
cv2.fillPoly (crop) ~0.2 ms/mask Complex shape, 80×80 crop

Encode/decode times are per-batch estimates. Single-mask overhead (Python dispatch, NumPy call)
is ignored at N=10 but becomes relevant at N=1 in tight loops.

xyxy is always present in Detections — crop bounds are free, no mask scan needed for encode.


Space

Format Per-object N = 10 N = 100 N = 1,000 vs Dense
Dense (current) 8.29 MB 82.9 MB 829 MB 8.3 GB
Local Crop + Offset 6.4 KB 64 KB 640 KB 6.4 MB 1,300×
Crop RLE ~2 KB (240 runs × int32 pair) 20 KB 200 KB 2 MB 4,000×
Polygon ⚠ lossy ~3.2 KB (400 pts × float32 xy) 32 KB 320 KB 3.2 MB 2,600×
memmap 8.29 MB (disk) 82.9 MB 829 MB 8.3 GB 1× (disk)

Notes:

  • Local Crop stores the full bounding-box rectangle (including background within bbox); the ~35%
    non-mask pixels are wasted. Crop RLE only encodes actual runs, so it wins for sparse objects.
  • Polygon vertex count grows with shape complexity; holes or disconnected contours add extra
    contours. For a mask with k holes: vertices ≈ (k+1) × avg_contour_length.
  • memmap saves RAM by paging to disk but total data is identical to Dense.

Encode time: dense → format

Source is a single (H, W) bool array per mask. xyxy bounds are known.

Format Complexity N = 10 N = 100 N = 1,000
Local Crop + Offset O(A) — strided slice from xyxy ~0.1 ms ~1 ms ~10 ms
Crop RLE O(A) — scan crop rows for runs ~0.2 ms ~2 ms ~20 ms
Polygon O(P) — cv2.findContours on crop ~2 ms ~20 ms ~200 ms
memmap O(I) — write 8.29 MB to disk ~80 ms ~800 ms ~8,000 ms

Notes:

  • Local Crop reads ~307 KB per mask from the source array (80 rows × 3840-byte stride) but writes
    only 6.4 KB; dominated by strided read, not write.
  • Polygon encode is ~20× slower than local crop due to findContours overhead even on the crop.
  • memmap is write-bound by NVMe; unsuitable for real-time inference pipelines.

Decode time: format → full (H, W) mask

Required by: MaskAnnotator, mask_iou_batch, move_masks, resize_masks, merge().
Dominant cost for all formats is allocating and zeroing a 8.29 MB array.

Format Complexity N = 10 N = 100 N = 1,000
Local Crop + Offset O(I) alloc-zeros + O(A) paste ~3 ms ~30 ms ~300 ms
Crop RLE O(I) alloc-zeros + O(A) expand + paste ~3 ms ~30 ms ~300 ms
Polygon O(I) alloc-zeros + O(A) fillPoly ~5 ms ~50 ms ~500 ms
memmap O(I) read from NVMe ~80 ms ~800 ms ~8,000 ms

Key insight: at 4K resolution, zeroing 8.29 MB per mask dominates for all in-memory formats.
Local Crop and Crop RLE are identical here — the format difference is irrelevant once you must
materialize the full image canvas.


Decode time: format → crop only (optimized path)

Possible when callers only need the object's local region, not the full image canvas.
Applicable to: contains_holes, contains_multiple_segments, filter_segments_by_distance,
area property, and — with coord-offset awareness — mask_to_polygons, MaskAnnotator crop path.

Format Complexity N = 10 N = 100 N = 1,000
Local Crop + Offset O(1) — already in memory 0 ms 0 ms 0 ms
Crop RLE O(A) — expand 240 runs ~0.02 ms ~0.2 ms ~2 ms
Polygon O(A) — fillPoly on crop canvas ~2 ms ~20 ms ~200 ms
memmap N/A — always full-size ~80 ms ~800 ms ~8,000 ms

Local Crop is uniquely suited here: the stored crop is the decompressed form. No
decode step exists. This is the primary performance argument over Crop RLE for the common case.


IoU / NMS time

Pairs to evaluate = N(N-1)/2. At 1% bbox overlap rate (aerial, sparse scene):

Format Strategy N = 10 N = 100 N = 1,000
Dense (current, resize to 640) All pairs, 640² pixel AND <1 ms ~100 ms ~10,000 ms
Local Crop + Offset Bbox pre-filter → pixel IoU on intersection only <1 ms <1 ms ~5 ms
Crop RLE Bbox pre-filter → expand intersection crops <1 ms ~1 ms ~15 ms
Polygon Bbox pre-filter → rasterize intersection crops <1 ms ~10 ms ~150 ms
memmap Same as Dense but reads from disk <1 ms ~100 ms ~10,000 ms

Dense at N=1,000: 499,500 pairs × 409,600 (640²) pixels = 204 billion ops ≈ 10 s.
The current code mitigates this via configurable memory chunking, but the ops count is unchanged.

Local Crop with bbox pre-filter at N=1,000: ~5,000 overlapping pairs (1%) × ~400 px
intersection region each = 2M ops ≈ 2,000× faster. The 99% non-overlapping pairs cost O(1)
each (vectorized xyxy IoU check).


Other properties

Property Dense Local Crop + Offset Crop RLE Polygon memmap
Lossless ✗ thin structures, 1-px features
NumPy drop-in ✗ needs wrapper ✗ needs wrapper ✗ needs wrapper
merge() across Detections ✓ same H,W ✓ with image_shape metadata ✓ same ✗ requires rasterization
COCO interop manual manual native native manual
Functions needing no change all area, contains_holes, contains_multiple_segments, filter_segments_by_distance same same all
Functions needing updates MaskAnnotator, mask_iou_batch, move_masks, calculate_centroids, resize_masks, mask_to_polygons, merge() same + RLE decode same + polygon rasterize
External deps none none none none (or pycocotools for C speed) none
Implementation complexity Low–Medium Medium Medium–High Very low
Worst case any large image object fills image (A ≈ I) complex texture (R ≈ A/2, checkerboard) irregular / thin / holey shapes I/O bottleneck
Best case few large objects many small objects (A ≪ I) very sparse objects smooth convex shapes RAM-constrained system

Summary

Local Crop + Offset is the right choice for the stated problem (aerial imagery, many small objects).

The decisive advantages:

  1. Memory — 1,300× smaller than dense for the stated scenario. RLE is ~3× smaller still but
    only for objects that are themselves sparse; for typical solid segmentation masks the difference
    is minor.
  2. NMS/IoU — the bbox pre-filter reduces computation from O(N² × I) to O(N² + N_overlap × A),
    a 2,000× speedup at N=1,000 in sparse scenes.
  3. Crop-only decode — O(1), enabling topology analysis (contains_holes, etc.) and eventually
    a crop-aware MaskAnnotator without ever allocating a full image-sized array.
  4. Encode is free — slice from xyxy bounds already present in Detections.

Crop RLE is worth considering only if object masks are themselves internally sparse (e.g.,
mesh objects, grid patterns) or if COCO RLE interop is a hard requirement. For typical solid
instance-segmentation masks in aerial imagery it adds decode overhead with minimal space gain.

Polygon is non-starter for lossless use cases; encode cost is 20× higher than local crop,
decode requires fillPoly, and fine detail is lost.

memmap solves a different problem (RAM exhaustion via swap) without reducing data size and
adds I/O latency at every access. Not applicable here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new CompactMask representation (crop + RLE) and integrates it across supervision so instance segmentation masks can be stored/processed more memory-efficiently while remaining largely compatible with existing Detections/annotators/utilities.

Changes:

  • Added CompactMask implementation with RLE encode/decode, slicing, merging, and offset translation support.
  • Updated Detections.merge, Detections.area, validators, annotators, inference slicer movement, and mask utilities/metrics to accept and efficiently handle CompactMask.
  • Added unit + integration tests covering CompactMask and its interaction with Detections and annotators.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/supervision/detection/compact_mask.py New CompactMask type with crop-RLE storage and numpy interop.
src/supervision/detection/core.py Enables merge/area paths to preserve and compute efficiently with CompactMask.
src/supervision/validators/__init__.py Updates mask validation to accept CompactMask via a length check.
src/supervision/annotators/core.py Optimizes MaskAnnotator to paint CompactMask crops without full-mask allocation.
src/supervision/detection/utils/masks.py Extends centroid computation to support CompactMask without full materialization.
src/supervision/metrics/utils/object_size.py Extends mask size categorization to use CompactMask.area.
src/supervision/detection/tools/inference_slicer.py Translates CompactMask masks by adjusting offsets (no dense conversion).
src/supervision/__init__.py Exposes CompactMask at package root import level.
tests/detection/test_compact_mask.py New unit test suite for RLE helpers and CompactMask behaviors.
tests/detection/test_compact_mask_integration.py New integration tests for Detections + CompactMask + annotators/merge.

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 95.17426% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 77%. Comparing base (c010656) to head (b2234da).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #2159    +/-   ##
========================================
+ Coverage       75%     77%    +2%     
========================================
  Files           62      63     +1     
  Lines         7545    7885   +340     
========================================
+ Hits          5648    6034   +386     
+ Misses        1897    1851    -46     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Dense (N, H, W) bool masks cause OOM for aerial imagery (1000 objects x
4K image ~ 8.3 GB). CompactMask encodes each mask as a run-length
sequence of its bounding-box crop, reducing typical usage to ~2 MB.

- New `CompactMask` class with full duck-typed ndarray interface:
  `__getitem__`, `__array__`, `shape`, `dtype`, `area`, `sum`, `merge`,
  `with_offset` — drop-in compatible with existing `np.ndarray` masks.
- Private row-major RLE helpers: `_rle_encode`, `_rle_decode`, `_rle_area`.
- Phase 2 integration: `Detections` accepts CompactMask for `mask` field;
  `validate_mask`, `area` property, and `Detections.merge` all handle it.
- Phase 3 optimised paths: `calculate_masks_centroids` uses crop-space
  arithmetic; `MaskAnnotator` paints crop regions directly; `move_detections`
  uses `with_offset` instead of materialising dense masks; `get_mask_size_category`
  uses `mask.area`.
- 54 new tests (41 unit + 13 integration); all 17 doctests pass.
- All 1190 existing tests pass; pre-commit hooks clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Borda and others added 5 commits March 2, 2026 20:38
Co-authored-by: Codex <codex@openai.com>
…tion

- Add `compact_mask_iou_batch` for optimised IoU computation on RLE crops (avoiding full (N, H, W) arrays).
- Enhance `mask_iou_batch` and NMS routines to support CompactMask inputs.
- Introduce `compact_masks` parameter in `InferenceSlicer` for end-to-end CompactMask handling.
- Update docstrings across affected components to reflect CompactMask integration.
@Borda Borda self-assigned this Mar 10, 2026
Borda and others added 3 commits March 10, 2026 10:40
…er integration

- Add correctness and integration tests for `compact_mask_iou_batch`, ensuring exact match with dense IoU results across multiple cases.
- Validate NMS behavior with CompactMask inputs for both isolated and overlapping masks.
- Introduce end-to-end tests in `InferenceSlicer` with `compact_masks=True`, verifying pipeline consistency against dense masks.
@Borda
Copy link
Member Author

Borda commented Mar 11, 2026

                                                     CompactMask — benchmark summary                                                     
╭───────────────┬─────────┬──────────────┬───────┬────────────┬───────────┬───────────┬─────────┬─────────┬───────────┬──────────┬──────╮
│               │         │              │       │            │   Compact │   Compact │     Mem │    Area │    Filter │    Annot │      │
│ Scenario      │ Objects │ Resolution   │  Fill │  Dense mem │    theory │    actual │     (x) │     (x) │       (x) │      (x) │ OK?  │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ FHD-100-5%    │     100 │ 1920x1080    │    5% │   207.4 MB │     33 KB │     63 KB │   6327x │  259.5x │    475.3x │    70.8x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ FHD-100-10%   │     100 │ 1920x1080    │   10% │   207.4 MB │     45 KB │     87 KB │   4564x │  275.1x │    424.6x │    48.6x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ FHD-100-20%   │     100 │ 1920x1080    │   20% │   207.4 MB │     67 KB │    137 KB │   3108x │  265.6x │    417.6x │    28.6x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-500-5%     │     500 │ 3840x2160    │    5% │  4147.2 MB │    139 KB │    250 KB │  29860x │ 1169.0x │   3559.4x │   421.3x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-500-10%    │     500 │ 3840x2160    │   10% │  4147.2 MB │    193 KB │    304 KB │  21489x │ 1125.8x │   1940.7x │   259.1x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-500-20%    │     500 │ 3840x2160    │   20% │  4147.2 MB │    284 KB │    403 KB │  14616x │ 1094.7x │   3711.7x │   134.4x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-1000-5%    │    1000 │ 3840x2160    │    5% │  8294.4 MB │    189 KB │    411 KB │  43919x │ 1145.9x │   6516.5x │   686.7x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-1000-10%   │    1000 │ 3840x2160    │   10% │  8294.4 MB │    277 KB │    498 KB │  29994x │ 1126.7x │   5935.1x │   437.5x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ 4K-1000-20%   │    1000 │ 3840x2160    │   20% │  8294.4 MB │    384 KB │    605 KB │  21606x │ 1114.7x │   5224.1x │   275.2x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ SAT-200-5%    │     200 │ 8192x8192    │    5% │ 13421.8 MB │    271 KB │    485 KB │  49454x │ 8378.0x │  18414.0x │   164.4x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ SAT-200-10%   │     200 │ 8192x8192    │   10% │ 13421.8 MB │    388 KB │    813 KB │  34574x │ 8567.8x │  12891.4x │   100.6x │  ✓   │
├───────────────┼─────────┼──────────────┼───────┼────────────┼───────────┼───────────┼─────────┼─────────┼───────────┼──────────┼──────┤
│ SAT-200-20%   │     200 │ 8192x8192    │   20% │ 13421.8 MB │    559 KB │   1418 KB │  23992x │ 7874.6x │  15730.2x │    59.6x │  ✓   │
╰───────────────┴─────────┴──────────────┴───────┴────────────┴───────────┴───────────┴─────────┴─────────┴───────────┴──────────┴──────╯

·  Compact theor. — sum of internal numpy buffer sizes
·  Compact actual — tracemalloc peak during .from_dense() (w/ Python overhead)
·  Mem x — dense / compact theoretical ratio
·  Area x — .area speedup (RLE sum, no materialisation)
·  Filter x — boolean-index speedup
·  Annot x — MaskAnnotator speedup (crop-paint vs full-frame alloc)
·  italic ms — dense skipped (array > 16 GB), compact absolute time shown

Adds examples/compact_mask/ with a standalone benchmark that demonstrates
CompactMask as a drop-in replacement for dense (N,H,W) bool mask arrays,
covering FHD / 4K / satellite (8192×8192) tiers at 5, 10, and 20 % fill.

Benchmark highlights:
- tracemalloc-based real memory measurement alongside theoretical nbytes
- DENSE_SKIP_GB threshold (12 GB) prevents swap thrashing on SAT scenarios
- LRU-cached synthetic mask generation (ellipses via cv2.ellipse)
- Staged design: stage_build / stage_area / stage_filter / stage_annotate /
  stage_correctness for clear separation of concerns
- Rich summary table with Compact theor. vs Compact actual columns
- All non-skipped scenarios verified: pixel-perfect annotation, exact area,
  lossless to_dense() roundtrip

README covers motivation, theoretical space/encode/decode/IoU analysis from
the PR design doc, drop-in API examples, and known limitations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.


You can also share your feedback on Copilot code review. Take the survey.

Borda and others added 4 commits March 11, 2026 15:33
- Add 5 new benchmark stages: iou, nms, merge, offset, centroids
- Add tracemalloc measurement for dense masks (theory vs malloc split)
- Add per-scenario JSONL result persistence (nan → null, timestamped)
- Add parallel timing via ThreadPoolExecutor (REPETITIONS=6, PARALLEL=3)
- Add gc.collect() before each timing rep and between scenarios
- Remove functools.cache from make_detections (caused 150 GB RAM usage)
- Colour-code speedup ratios: green ≥10x, yellow 1-10x, red <1x
- Rename theor. → theory in table headers; add att./op. type labels
- Fix stage_offset broadcast error by expanding canvas by offset amount
- Fix correctness display with proper f-string concatenation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, and segments

- Add NMM tests for CompactMask, ensuring numerical consistency with dense input.
- Add `calculate_masks_centroids` tests, validating exact results across both paths.
- Add `contains_holes` and `contains_multiple_segments` tests, verifying behavior after encode-decode roundtrip.
- Refactor indexing logic in CompactMask for performance and maintainability.
- Simplify CompactMask concatenation by removing redundant `.astype()` calls.
CompactMask.repack(): re-encodes each mask crop using tight bounding
boxes, eliminating background padding from loose detector bboxes.
O(sum of crop areas); useful as a one-time cleanup after accumulating
many InferenceSlicer tile merges.

Detections.is_empty() fast path: avoids calling __eq__ which
materialised the full (N, H, W) CompactMask array just to check
emptiness — turning an O(N·H·W) check into O(1).  This was the root
cause of the 0.56x merge regression.

CompactMask.merge() now uses list.extend (C-level) instead of a flat
list comprehension, reducing Python bytecode overhead under GIL
contention.

benchmark: pre-compute half-splits outside the timed lambda so
stage_merge measures only the concatenation, not the slicing.

New tests: repack() (4 cases), NMM parity (TestNmmWithCompactMask),
centroids parity (TestCalculateMasksCentroidsCompact), contains_holes
and contains_multiple_segments roundtrip parity after CompactMask
encode/decode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Borda and others added 3 commits March 11, 2026 19:06
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Update `calculate_masks_centroids` to assign centroids of (0, 0) for all-zero tight crops, avoiding division by zero and ensuring consistency with dense implementation.
- Refine indexing logic in `CompactMask` to support Python `list[bool]` as a mask selector.
- Add tests for empty masks and boolean list indexing to ensure correctness and parity across scenarios.
…ions

- Introduce `bbox_xyxy` property to compute inclusive bounding boxes for masks, enabling better metadata access and usability.
- Refine type annotations for variables like `centroids`, `flat`, and `result` to ensure clarity and type safety.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.


You can also share your feedback on Copilot code review. Take the survey.

Borda and others added 9 commits March 11, 2026 19:19
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Refactor `with_offset` to clip partially or fully out-of-frame masks, ensuring they remain valid and consistent with `move_masks` behavior.
- Add iterator support to `CompactMask` for generating dense boolean arrays.
- Update `InferenceSlicer` to handle `CompactMask` offsets without dense materialization.
- Introduce extensive tests to validate clipping behavior and parity with `move_masks`.
Remove hard line-wraps from all prose paragraphs — lines now flow as
single lines. Add "Operation-by-Operation Speedup Analysis" section
covering Memory, .area, filter/__getitem__, annotate, IoU, NMS, merge,
with_offset, and centroids with numbered compounding-factor tables and
expected speedups for each.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When no crop overflows the new canvas — the common case in
InferenceSlicer where the canvas is expanded by the tile offset —
with_offset() now runs in O(N): one numpy broadcast to add (dx, dy) to
the offsets array, a vectorised bounds check, and a shared-RLE return.
No RLE data is decoded or re-encoded.

Only masks that genuinely straddle the image boundary go through the
slow decode+clip+re-encode path. This brings with_offset from 0.67x
(slower than dense) to >1 000x faster in the no-clip case.

Update examples/compact_mask/README.md to reflect the new fast path
description and summary table speedup (~40x → >1 000x).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erence

Compact NMS uses exact full-res crop IoU while dense NMS downsamples to
640px first. Borderline pairs near the 0.5 threshold can flip between the
two paths — this is a quality improvement in compact, not a bug.

Changes:
- stage_nms now returns a 4-tuple (dense_s, compact_s, nms_ok, n_diff)
- nms_ok is strict (n_diff == 0) — no silent tolerance
- nms_mismatch_count field added to ScenarioResult for JSON logging
- Correctness display shows nms=✗(N) with the exact count so it's clear
  how many decisions differ and whether it's a rounding artefact (1-3)
  or a real bug (many more)
- stage_nms docstring explains the resize-vs-exact quality difference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five new test classes covering the full CompactMask surface against dense
ground truth, each parametrised over 10 seeds (seeds 0-9) with varying
object counts (N=1,5,20,50) and image sizes (50x50 to 1080x1920):

- TestCompactMaskRoundtripRandom  — from_dense→to_dense pixel equality,
  shape/len, and per-index access
- TestCompactMaskAreaRandom       — .area and .sum(axis=(1,2)) match dense
- TestCompactMaskFilterRandom     — boolean and integer-list filter parity
- TestCompactMaskWithOffsetRandom — with_offset matches move_masks for
  random offsets including partial and full out-of-frame cases
- TestCompactMaskIouRandom        — compact_mask_iou_batch matches dense
  mask_iou_batch; self-IoU diagonal is 1.0; tight-bbox parity

All 198 tests pass in <1 s.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Across compact_mask.py, iou_and_nms.py, and both test files:

- n → num_masks (or num_pixels in _rle_decode)
- h, w → img_h, img_w
- i (loop) → mask_idx
- i, j (iou pair loop) → idx_a, idx_b
- i (chunked loop) → chunk_start
- i (nms loop) → row_idx
- m (merge loop) → cm
- r (area comprehension) → rle
- a, b (mask arrays in tests) → masks_a, masks_b
- k (selected count) → num_selected
- g, d (jaccard loop) → gt_box, det_box

Coordinate shorthands (x1, y1, dx, dy, ix1, iy1, etc.) left unchanged
as they are standard and unambiguous in geometric contexts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssion

Dense masks were resized to 640 px before IoU computation, while CompactMask
used exact full-resolution crop IoU. For borderline pairs whose true IoU is
close to the threshold, the downscaling flipped keep/suppress decisions.

Fix: call mask_iou_batch directly on full-resolution masks for both paths.
mask_dimension parameter kept for backward compatibility but is now a no-op.

Add regression test at 1920x1080 with a borderline pair near IoU=0.5 to
prevent recurrence. Existing tests used ≤40x40 images where resize upscaled
(no information loss), so the lossy code path was never exercised.

Also revise benchmark parameter matrix: FHD-200/400, 4K-100, SAT-200 tiers;
fill fractions [0.05, 0.20, 0.50] to match real supervision/SAM-2 use cases;
IOU_DENSE_SKIP_GB=1.0 so IoU+NMS dense timing is only run for sub-1 GB tiers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NMS section: remove resize_masks/640px approximation (bug was fixed —
  both paths now call mask_iou_batch directly with exact IoU)
- Operating point: replace nonexistent 4K-500-5% with FHD-200-50%-v600
  as the primary reference scenario throughout all analysis sections
- Per-operation speedups: cite measured values from new benchmark run
  (.area 176x, filter 467x, annotate 26x, iou 464x, nms 109x,
   merge 908x, offset 2214x, centroids 19x at FHD-200-50%-v600;
   SAT-200 extremes: merge 272709x, offset 183199x)
- Tier table: 3 tiers → 6 tiers (FHD-100/200/400, 4K-100/200, SAT-200);
  fill fractions 5/10/20% → 5/20/50% (sparse/moderate/SAM-everything)
- Sample results table: 5 rows → 8 representative rows covering full
  range; add Area/Filter/Annot/IoU/NMS/Merge/Offset speedup columns;
  update skip threshold footnote (IOU_DENSE_SKIP_GB=1.0, not 12 GB)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants