Conversation
Mask Storage Format ComparisonAssumptions
Encode/decode times are per-batch estimates. Single-mask overhead (Python dispatch, NumPy call)
Space
Notes:
Encode time: dense → formatSource is a single
Notes:
Decode time: format → full
|
| Format | Complexity | N = 10 | N = 100 | N = 1,000 |
|---|---|---|---|---|
| Local Crop + Offset | O(I) alloc-zeros + O(A) paste | ~3 ms | ~30 ms | ~300 ms |
| Crop RLE | O(I) alloc-zeros + O(A) expand + paste | ~3 ms | ~30 ms | ~300 ms |
| Polygon | O(I) alloc-zeros + O(A) fillPoly |
~5 ms | ~50 ms | ~500 ms |
| memmap | O(I) read from NVMe | ~80 ms | ~800 ms | ~8,000 ms |
Key insight: at 4K resolution, zeroing 8.29 MB per mask dominates for all in-memory formats.
Local Crop and Crop RLE are identical here — the format difference is irrelevant once you must
materialize the full image canvas.
Decode time: format → crop only (optimized path)
Possible when callers only need the object's local region, not the full image canvas.
Applicable to: contains_holes, contains_multiple_segments, filter_segments_by_distance,
area property, and — with coord-offset awareness — mask_to_polygons, MaskAnnotator crop path.
| Format | Complexity | N = 10 | N = 100 | N = 1,000 |
|---|---|---|---|---|
| Local Crop + Offset | O(1) — already in memory | 0 ms | 0 ms | 0 ms |
| Crop RLE | O(A) — expand 240 runs | ~0.02 ms | ~0.2 ms | ~2 ms |
| Polygon | O(A) — fillPoly on crop canvas |
~2 ms | ~20 ms | ~200 ms |
| memmap | N/A — always full-size | ~80 ms | ~800 ms | ~8,000 ms |
Local Crop is uniquely suited here: the stored crop is the decompressed form. No
decode step exists. This is the primary performance argument over Crop RLE for the common case.
IoU / NMS time
Pairs to evaluate = N(N-1)/2. At 1% bbox overlap rate (aerial, sparse scene):
| Format | Strategy | N = 10 | N = 100 | N = 1,000 |
|---|---|---|---|---|
| Dense (current, resize to 640) | All pairs, 640² pixel AND | <1 ms | ~100 ms | ~10,000 ms |
| Local Crop + Offset | Bbox pre-filter → pixel IoU on intersection only | <1 ms | <1 ms | ~5 ms |
| Crop RLE | Bbox pre-filter → expand intersection crops | <1 ms | ~1 ms | ~15 ms |
| Polygon | Bbox pre-filter → rasterize intersection crops | <1 ms | ~10 ms | ~150 ms |
| memmap | Same as Dense but reads from disk | <1 ms | ~100 ms | ~10,000 ms |
Dense at N=1,000: 499,500 pairs × 409,600 (640²) pixels = 204 billion ops ≈ 10 s.
The current code mitigates this via configurable memory chunking, but the ops count is unchanged.
Local Crop with bbox pre-filter at N=1,000: ~5,000 overlapping pairs (1%) × ~400 px
intersection region each = 2M ops ≈ 2,000× faster. The 99% non-overlapping pairs cost O(1)
each (vectorized xyxy IoU check).
Other properties
| Property | Dense | Local Crop + Offset | Crop RLE | Polygon | memmap |
|---|---|---|---|---|---|
| Lossless | ✓ | ✓ | ✓ | ✗ thin structures, 1-px features | ✓ |
| NumPy drop-in | ✓ | ✗ needs wrapper | ✗ needs wrapper | ✗ needs wrapper | ✓ |
merge() across Detections |
✓ same H,W | ✓ with image_shape metadata | ✓ same | ✗ requires rasterization | ✓ |
| COCO interop | manual | manual | native | native | manual |
| Functions needing no change | all | area, contains_holes, contains_multiple_segments, filter_segments_by_distance |
same | same | all |
| Functions needing updates | — | MaskAnnotator, mask_iou_batch, move_masks, calculate_centroids, resize_masks, mask_to_polygons, merge() |
same + RLE decode | same + polygon rasterize | — |
| External deps | none | none | none | none (or pycocotools for C speed) |
none |
| Implementation complexity | — | Low–Medium | Medium | Medium–High | Very low |
| Worst case | any large image | object fills image (A ≈ I) | complex texture (R ≈ A/2, checkerboard) | irregular / thin / holey shapes | I/O bottleneck |
| Best case | few large objects | many small objects (A ≪ I) | very sparse objects | smooth convex shapes | RAM-constrained system |
Summary
Local Crop + Offset is the right choice for the stated problem (aerial imagery, many small objects).
The decisive advantages:
- Memory — 1,300× smaller than dense for the stated scenario. RLE is ~3× smaller still but
only for objects that are themselves sparse; for typical solid segmentation masks the difference
is minor. - NMS/IoU — the bbox pre-filter reduces computation from O(N² × I) to O(N² + N_overlap × A),
a 2,000× speedup at N=1,000 in sparse scenes. - Crop-only decode — O(1), enabling topology analysis (
contains_holes, etc.) and eventually
a crop-awareMaskAnnotatorwithout ever allocating a full image-sized array. - Encode is free — slice from xyxy bounds already present in
Detections.
Crop RLE is worth considering only if object masks are themselves internally sparse (e.g.,
mesh objects, grid patterns) or if COCO RLE interop is a hard requirement. For typical solid
instance-segmentation masks in aerial imagery it adds decode overhead with minimal space gain.
Polygon is non-starter for lossless use cases; encode cost is 20× higher than local crop,
decode requires fillPoly, and fine detail is lost.
memmap solves a different problem (RAM exhaustion via swap) without reducing data size and
adds I/O latency at every access. Not applicable here.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new CompactMask representation (crop + RLE) and integrates it across supervision so instance segmentation masks can be stored/processed more memory-efficiently while remaining largely compatible with existing Detections/annotators/utilities.
Changes:
- Added
CompactMaskimplementation with RLE encode/decode, slicing, merging, and offset translation support. - Updated
Detections.merge,Detections.area, validators, annotators, inference slicer movement, and mask utilities/metrics to accept and efficiently handleCompactMask. - Added unit + integration tests covering
CompactMaskand its interaction withDetectionsand annotators.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/supervision/detection/compact_mask.py |
New CompactMask type with crop-RLE storage and numpy interop. |
src/supervision/detection/core.py |
Enables merge/area paths to preserve and compute efficiently with CompactMask. |
src/supervision/validators/__init__.py |
Updates mask validation to accept CompactMask via a length check. |
src/supervision/annotators/core.py |
Optimizes MaskAnnotator to paint CompactMask crops without full-mask allocation. |
src/supervision/detection/utils/masks.py |
Extends centroid computation to support CompactMask without full materialization. |
src/supervision/metrics/utils/object_size.py |
Extends mask size categorization to use CompactMask.area. |
src/supervision/detection/tools/inference_slicer.py |
Translates CompactMask masks by adjusting offsets (no dense conversion). |
src/supervision/__init__.py |
Exposes CompactMask at package root import level. |
tests/detection/test_compact_mask.py |
New unit test suite for RLE helpers and CompactMask behaviors. |
tests/detection/test_compact_mask_integration.py |
New integration tests for Detections + CompactMask + annotators/merge. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #2159 +/- ##
========================================
+ Coverage 75% 77% +2%
========================================
Files 62 63 +1
Lines 7545 7885 +340
========================================
+ Hits 5648 6034 +386
+ Misses 1897 1851 -46 🚀 New features to boost your workflow:
|
Dense (N, H, W) bool masks cause OOM for aerial imagery (1000 objects x 4K image ~ 8.3 GB). CompactMask encodes each mask as a run-length sequence of its bounding-box crop, reducing typical usage to ~2 MB. - New `CompactMask` class with full duck-typed ndarray interface: `__getitem__`, `__array__`, `shape`, `dtype`, `area`, `sum`, `merge`, `with_offset` — drop-in compatible with existing `np.ndarray` masks. - Private row-major RLE helpers: `_rle_encode`, `_rle_decode`, `_rle_area`. - Phase 2 integration: `Detections` accepts CompactMask for `mask` field; `validate_mask`, `area` property, and `Detections.merge` all handle it. - Phase 3 optimised paths: `calculate_masks_centroids` uses crop-space arithmetic; `MaskAnnotator` paints crop regions directly; `move_detections` uses `with_offset` instead of materialising dense masks; `get_mask_size_category` uses `mask.area`. - 54 new tests (41 unit + 13 integration); all 17 doctests pass. - All 1190 existing tests pass; pre-commit hooks clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
…tion - Add `compact_mask_iou_batch` for optimised IoU computation on RLE crops (avoiding full (N, H, W) arrays). - Enhance `mask_iou_batch` and NMS routines to support CompactMask inputs. - Introduce `compact_masks` parameter in `InferenceSlicer` for end-to-end CompactMask handling. - Update docstrings across affected components to reflect CompactMask integration.
…er integration - Add correctness and integration tests for `compact_mask_iou_batch`, ensuring exact match with dense IoU results across multiple cases. - Validate NMS behavior with CompactMask inputs for both isolated and overlapping masks. - Introduce end-to-end tests in `InferenceSlicer` with `compact_masks=True`, verifying pipeline consistency against dense masks.
|
Adds examples/compact_mask/ with a standalone benchmark that demonstrates CompactMask as a drop-in replacement for dense (N,H,W) bool mask arrays, covering FHD / 4K / satellite (8192×8192) tiers at 5, 10, and 20 % fill. Benchmark highlights: - tracemalloc-based real memory measurement alongside theoretical nbytes - DENSE_SKIP_GB threshold (12 GB) prevents swap thrashing on SAT scenarios - LRU-cached synthetic mask generation (ellipses via cv2.ellipse) - Staged design: stage_build / stage_area / stage_filter / stage_annotate / stage_correctness for clear separation of concerns - Rich summary table with Compact theor. vs Compact actual columns - All non-skipped scenarios verified: pixel-perfect annotation, exact area, lossless to_dense() roundtrip README covers motivation, theoretical space/encode/decode/IoU analysis from the PR design doc, drop-in API examples, and known limitations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
You can also share your feedback on Copilot code review. Take the survey.
- Add 5 new benchmark stages: iou, nms, merge, offset, centroids - Add tracemalloc measurement for dense masks (theory vs malloc split) - Add per-scenario JSONL result persistence (nan → null, timestamped) - Add parallel timing via ThreadPoolExecutor (REPETITIONS=6, PARALLEL=3) - Add gc.collect() before each timing rep and between scenarios - Remove functools.cache from make_detections (caused 150 GB RAM usage) - Colour-code speedup ratios: green ≥10x, yellow 1-10x, red <1x - Rename theor. → theory in table headers; add att./op. type labels - Fix stage_offset broadcast error by expanding canvas by offset amount - Fix correctness display with proper f-string concatenation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, and segments - Add NMM tests for CompactMask, ensuring numerical consistency with dense input. - Add `calculate_masks_centroids` tests, validating exact results across both paths. - Add `contains_holes` and `contains_multiple_segments` tests, verifying behavior after encode-decode roundtrip. - Refactor indexing logic in CompactMask for performance and maintainability. - Simplify CompactMask concatenation by removing redundant `.astype()` calls.
CompactMask.repack(): re-encodes each mask crop using tight bounding boxes, eliminating background padding from loose detector bboxes. O(sum of crop areas); useful as a one-time cleanup after accumulating many InferenceSlicer tile merges. Detections.is_empty() fast path: avoids calling __eq__ which materialised the full (N, H, W) CompactMask array just to check emptiness — turning an O(N·H·W) check into O(1). This was the root cause of the 0.56x merge regression. CompactMask.merge() now uses list.extend (C-level) instead of a flat list comprehension, reducing Python bytecode overhead under GIL contention. benchmark: pre-compute half-splits outside the timed lambda so stage_merge measures only the concatenation, not the slicing. New tests: repack() (4 cases), NMM parity (TestNmmWithCompactMask), centroids parity (TestCalculateMasksCentroidsCompact), contains_holes and contains_multiple_segments roundtrip parity after CompactMask encode/decode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Update `calculate_masks_centroids` to assign centroids of (0, 0) for all-zero tight crops, avoiding division by zero and ensuring consistency with dense implementation. - Refine indexing logic in `CompactMask` to support Python `list[bool]` as a mask selector. - Add tests for empty masks and boolean list indexing to ensure correctness and parity across scenarios.
…ions - Introduce `bbox_xyxy` property to compute inclusive bounding boxes for masks, enabling better metadata access and usability. - Refine type annotations for variables like `centroids`, `flat`, and `result` to ensure clarity and type safety.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
You can also share your feedback on Copilot code review. Take the survey.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Refactor `with_offset` to clip partially or fully out-of-frame masks, ensuring they remain valid and consistent with `move_masks` behavior. - Add iterator support to `CompactMask` for generating dense boolean arrays. - Update `InferenceSlicer` to handle `CompactMask` offsets without dense materialization. - Introduce extensive tests to validate clipping behavior and parity with `move_masks`.
Remove hard line-wraps from all prose paragraphs — lines now flow as single lines. Add "Operation-by-Operation Speedup Analysis" section covering Memory, .area, filter/__getitem__, annotate, IoU, NMS, merge, with_offset, and centroids with numbered compounding-factor tables and expected speedups for each. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When no crop overflows the new canvas — the common case in InferenceSlicer where the canvas is expanded by the tile offset — with_offset() now runs in O(N): one numpy broadcast to add (dx, dy) to the offsets array, a vectorised bounds check, and a shared-RLE return. No RLE data is decoded or re-encoded. Only masks that genuinely straddle the image boundary go through the slow decode+clip+re-encode path. This brings with_offset from 0.67x (slower than dense) to >1 000x faster in the no-clip case. Update examples/compact_mask/README.md to reflect the new fast path description and summary table speedup (~40x → >1 000x). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…erence Compact NMS uses exact full-res crop IoU while dense NMS downsamples to 640px first. Borderline pairs near the 0.5 threshold can flip between the two paths — this is a quality improvement in compact, not a bug. Changes: - stage_nms now returns a 4-tuple (dense_s, compact_s, nms_ok, n_diff) - nms_ok is strict (n_diff == 0) — no silent tolerance - nms_mismatch_count field added to ScenarioResult for JSON logging - Correctness display shows nms=✗(N) with the exact count so it's clear how many decisions differ and whether it's a rounding artefact (1-3) or a real bug (many more) - stage_nms docstring explains the resize-vs-exact quality difference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Five new test classes covering the full CompactMask surface against dense ground truth, each parametrised over 10 seeds (seeds 0-9) with varying object counts (N=1,5,20,50) and image sizes (50x50 to 1080x1920): - TestCompactMaskRoundtripRandom — from_dense→to_dense pixel equality, shape/len, and per-index access - TestCompactMaskAreaRandom — .area and .sum(axis=(1,2)) match dense - TestCompactMaskFilterRandom — boolean and integer-list filter parity - TestCompactMaskWithOffsetRandom — with_offset matches move_masks for random offsets including partial and full out-of-frame cases - TestCompactMaskIouRandom — compact_mask_iou_batch matches dense mask_iou_batch; self-IoU diagonal is 1.0; tight-bbox parity All 198 tests pass in <1 s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Across compact_mask.py, iou_and_nms.py, and both test files: - n → num_masks (or num_pixels in _rle_decode) - h, w → img_h, img_w - i (loop) → mask_idx - i, j (iou pair loop) → idx_a, idx_b - i (chunked loop) → chunk_start - i (nms loop) → row_idx - m (merge loop) → cm - r (area comprehension) → rle - a, b (mask arrays in tests) → masks_a, masks_b - k (selected count) → num_selected - g, d (jaccard loop) → gt_box, det_box Coordinate shorthands (x1, y1, dx, dy, ix1, iy1, etc.) left unchanged as they are standard and unambiguous in geometric contexts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssion Dense masks were resized to 640 px before IoU computation, while CompactMask used exact full-resolution crop IoU. For borderline pairs whose true IoU is close to the threshold, the downscaling flipped keep/suppress decisions. Fix: call mask_iou_batch directly on full-resolution masks for both paths. mask_dimension parameter kept for backward compatibility but is now a no-op. Add regression test at 1920x1080 with a borderline pair near IoU=0.5 to prevent recurrence. Existing tests used ≤40x40 images where resize upscaled (no information loss), so the lossy code path was never exercised. Also revise benchmark parameter matrix: FHD-200/400, 4K-100, SAT-200 tiers; fill fractions [0.05, 0.20, 0.50] to match real supervision/SAM-2 use cases; IOU_DENSE_SKIP_GB=1.0 so IoU+NMS dense timing is only run for sub-1 GB tiers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NMS section: remove resize_masks/640px approximation (bug was fixed — both paths now call mask_iou_batch directly with exact IoU) - Operating point: replace nonexistent 4K-500-5% with FHD-200-50%-v600 as the primary reference scenario throughout all analysis sections - Per-operation speedups: cite measured values from new benchmark run (.area 176x, filter 467x, annotate 26x, iou 464x, nms 109x, merge 908x, offset 2214x, centroids 19x at FHD-200-50%-v600; SAT-200 extremes: merge 272709x, offset 183199x) - Tier table: 3 tiers → 6 tiers (FHD-100/200/400, 4K-100/200, SAT-200); fill fractions 5/10/20% → 5/20/50% (sparse/moderate/SAM-everything) - Sample results table: 5 rows → 8 representative rows covering full range; add Area/Filter/Annot/IoU/NMS/Merge/Offset speedup columns; update skip threshold footnote (IOU_DENSE_SKIP_GB=1.0, not 12 GB) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This pull request adds support for using the new
CompactMaskclass throughout thesupervisionlibrary, enabling more efficient storage and manipulation of segmentation masks. The changes ensure thatCompactMaskcan be seamlessly used in place of dense numpy arrays inDetections, annotators, metrics, and utility functions, while maintaining backward compatibility. Comprehensive integration tests are also added to verify correct behavior.Core support for CompactMask in Detections and related utilities:
Detectionsnow acceptsCompactMaskas themaskfield, and all relevant methods (e.g., merging, area calculation, validation) are updated to handleCompactMaskefficiently. This includes optimized merging and area computation without materializing dense arrays. [1] [2] [3] [4]annotatemethod in annotators (e.g.,MaskAnnotator) is updated to efficiently paint masks usingCompactMaskcrops, avoiding unnecessary memory allocation. [1] [2]Integration with utility functions and metrics:
calculate_masks_centroidsandget_mask_size_categorynow acceptCompactMaskand operate efficiently on its representation. [1] [2] [3]move_detectionsfunction ininference_sliceradjustsCompactMaskoffsets directly for efficient mask translation.Testing and validation:
test_compact_mask_integration.py, covering construction, filtering, merging, annotation, and equality checks forDetectionswithCompactMask.General improvements and imports:
CompactMaskand ensure type safety. [1] [2]These changes collectively enable efficient, flexible, and transparent use of compact mask representations across the entire
supervisionlibrary.