VideoFormat ↔ VideoFormatComponent reference cycle prevents gc from freeing decoded frames

## Summary

`VideoFormat` and `VideoFormatComponent` form a reference cycle that CPython's reference counting cannot break. Every decoded frame creates these objects, and they accumulate until `gc.collect()` is explicitly called. This is a significant memory issue for long-running video processing applications.

## The cycle

```
VideoFormat.components (tuple) → VideoFormatComponent.format → VideoFormat
```

In `format.pyx` / `format.py`:
- `VideoFormat._init()` eagerly creates: `self.components = tuple(VideoFormatComponent(self, i) for ...)`
- `VideoFormatComponent.__cinit__()` stores: `self.format = format` (strong back-reference)

Both fields are `cdef` — cannot be modified or broken from Python.

## Minimal reproducer

```python
import gc

gc.collect()
gc.set_debug(gc.DEBUG_SAVEALL)
gc.disable()

import av
fmt = av.VideoFormat('bgr24', 1920, 1080)
del fmt

n = gc.collect()
print(f"gc.collect() freed: {n}")
print(f"gc.garbage: {len(gc.garbage)} objects")
for obj in gc.garbage:
    print(f"  {type(obj).__name__}: {repr(obj)[:80]}")
gc.garbage.clear()
gc.set_debug(0)
```

Output:
```
gc.collect() freed: 5
gc.garbage: 5 objects
  VideoFormat: <av.VideoFormat bgr24, 1920x1080>
  VideoFormatComponent: <av.video.format.VideoFormatComponent object at 0x...>
  VideoFormatComponent: <av.video.format.VideoFormatComponent object at 0x...>
  VideoFormatComponent: <av.video.format.VideoFormatComponent object at 0x...>
  tuple: (<av.video.format.VideoFormatComponent object at 0x...>, ...
```

Every `VideoFormat` that goes out of scope leaks 5 objects (1 format + 3 components + 1 tuple) until the cyclic GC runs.

## Real-world impact

We run a long-lived video processing pipeline decoding 64 concurrent RTSP streams (12× 4K HEVC). Without periodic `gc.collect()` calls, memory grows from 7 GB to 16+ GB due to accumulated `VideoFormat` / `VideoFormatComponent` cycles from decoded frames.

We instrumented `gc.collect()` with `DEBUG_SAVEALL` during a live run and the dominant garbage types are:

| Type | Count per collection |
|------|---------------------|
| `av.video.format.VideoFormatComponent` | 105 |
| `av.video.format.VideoFormat` | 35 |
| `tuple` (component tuples) | 104 |
| `av.video.frame.VideoFrame` | ~5 |
| `av.sidedata.sidedata.SideDataContainer` | ~5 |

Our workaround is a dedicated `gc.collect()` daemon thread every 15 seconds, which keeps memory stable but acquires the GIL for the full sweep — causing latency spikes in our inference pipeline.

## Suggested fix

Make `components` lazy (computed on first access) instead of eagerly stored, similar to how #517 / PR #516 made `planes` lazy to fix the `VideoFrame` cycle. Alternatively, use a weakref for `VideoFormatComponent.format`.

## Versions

- PyAV 13.1.0 (our production version)
- Also confirmed on PyAV 17.0.0 (`main` branch) — `format.py` and `format.pxd` still have the same eager `components` tuple and strong `cdef VideoFormat format` back-reference.

## Related

- #517 — Fixed AudioFrame / VideoFrame plane cycles (2019), but did not address VideoFormat ↔ VideoFormatComponent
- #516 — PR that made planes lazy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VideoFormat ↔ VideoFormatComponent reference cycle prevents gc from freeing decoded frames #2206

Summary

The cycle

Minimal reproducer

Real-world impact

Suggested fix

Versions

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Count per collection
`av.video.format.VideoFormatComponent`	105
`av.video.format.VideoFormat`	35
`tuple` (component tuples)	104
`av.video.frame.VideoFrame`	~5
`av.sidedata.sidedata.SideDataContainer`	~5

VideoFormat ↔ VideoFormatComponent reference cycle prevents gc from freeing decoded frames #2206

Description

Summary

The cycle

Minimal reproducer

Real-world impact

Suggested fix

Versions

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions