Skip to content

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191

Open
gaurav02081 wants to merge 12 commits intoCCExtractor:masterfrom
gaurav02081:gaurav-ffmpeg
Open

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing#2191
gaurav02081 wants to merge 12 commits intoCCExtractor:masterfrom
gaurav02081:gaurav-ffmpeg

Conversation

@gaurav02081
Copy link
Contributor

@gaurav02081 gaurav02081 commented Mar 8, 2026

[IMPROVEMENT] feat(mp4): add FFmpeg/libavformat backend for MP4 demuxing

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Add optional FFmpeg-based MP4 parser as an alternative to GPAC

This PR introduces an alternative MP4 parsing backend using FFmpeg's libavformat, while keeping the existing GPAC-based implementation unchanged and as the default.

Motivation

In a previous discussion (Gsoc meeting 2 MARCH) we talked about updating the GPAC dependency used for MP4 processing in CCExtractor. One suggestion was to explore whether there is a Debian-friendly alternative rather than only focusing on upgrading GPAC.

FFmpeg is already used in other parts of the codebase (for example in the demuxing/decoding integration and in the HardsubX module), so extending its use for MP4 parsing seemed like a reasonable option to explore.

Implementation

A new implementation (mp4_ffmpeg.c) was added which uses FFmpeg's libavformat to open and parse MP4 containers.

The general workflow is:

  • Open the MP4 container with avformat_open_input()
  • Discover streams using avformat_find_stream_info()
  • Read packets sequentially using av_read_frame()
  • Dispatch packets based on stream type

Video packets (H.264 / HEVC) are passed to the existing do_NAL() processing logic, while caption tracks (CEA-608 / CEA-708) and subtitle tracks (tx3g) continue to use the existing CCExtractor parsing functions.

One difference from the GPAC implementation is that FFmpeg reads packets sequentially across all streams, whereas the GPAC implementation reads samples per track. The downstream caption extraction pipeline remains unchanged.

For H.264 / HEVC streams, codec configuration data is obtained from the stream extradata (avcC / hvcC) in order to determine the NAL unit length prefix size and extract SPS/PPS before processing packets.

Build configuration

This backend is optional and controlled through a compile-time flag:

-DUSE_FFMPEG_MP4=ON
  • Default build → uses GPAC (mp4.c)
  • FFmpeg build → uses the new implementation (mp4_ffmpeg.c)

The runtime behavior of CCExtractor remains unchanged — the difference only affects how the MP4 container is parsed internally.

Summary

This PR:

  • Adds an FFmpeg-based MP4 parser
  • Keeps GPAC as the default implementation
  • Introduces a compile-time option to switch between the two backends
  • Leaves the caption extraction pipeline unchanged

This provides a potential alternative MP4 backend using a widely available multimedia framework while preserving the existing behavior.

Add a new CI job (cmake_ffmpeg_mp4) that builds CCExtractor with the
optional FFmpeg-based MP4 parser enabled via -DUSE_FFMPEG_MP4=ON.

The workflow now verifies two builds:
- Default build using GPAC
- FFmpeg MP4 build using a separate build directory

This ensures the FFmpeg backend compiles successfully alongside the
default GPAC implementation.
@gaurav02081 gaurav02081 force-pushed the gaurav-ffmpeg branch 2 times, most recently from c83ebd9 to 1d96d2c Compare March 9, 2026 17:49
@gaurav02081
Copy link
Contributor Author

Update: Added a new CI job to build CCExtractor with the optional FFmpeg MP4 backend.

The workflow now performs two builds:

Default build using the GPAC implementation

FFmpeg build using -DUSE_FFMPEG_MP4=ON

Both builds run --version to verify the binaries execute correctly, and separate build directories are used to avoid CMake cache conflicts.

@cfsmp3
Copy link
Contributor

cfsmp3 commented Mar 14, 2026

Thanks for the PR — the implementation is clean and well-structured, and the dedicated CI job is a nice touch.

However, we're actively moving the codebase from C to Rust, so we can't accept new C modules. We'd need the FFmpeg MP4 demuxer to be implemented in Rust. See #2170 for an example of how this can be done using rsmpeg (Rust FFmpeg bindings) with a thin C bridge for the callbacks into the existing C code.

If you'd like to rework this in Rust we'd be happy to review it. The overall approach (avformat_open_input → av_read_frame → dispatch by stream type) is sound, it's just the implementation language that needs to change.

  Replace GPAC-based MP4 demuxer with FFmpeg-based implementation in Rust
  using rsmpeg, activated via -DWITH_FFMPEG=ON. GPAC path remains default
  and untouched when FFmpeg is not enabled.

  Architecture:
  - Rust core (demuxer/mp4.rs): opens MP4 via rsmpeg, classifies tracks,
    dispatches packets to C bridge functions
  - FFI exports (mp4_ffmpeg_exports.rs): ccxr_processmp4/ccxr_dumpchapters
    callable from C
  - C bridge (mp4_rust_bridge.c): flat FFI-safe wrappers around existing
    do_NAL, process608, dtvcc_process_data, store_hdcc, encode_sub

  Supports AVC/H.264, HEVC/H.265, CEA-608, CEA-708, and tx3g tracks.
  Parses SPS/PPS/VPS from extradata for proper caption extraction.

  Build: cmake -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src
  CI: fixed Linux cmake_ffmpeg_mp4 job, added macOS cmake_ffmpeg_mp4 job
gaurav-dev02 and others added 9 commits March 15, 2026 23:34
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rusty_ffmpeg requires libavdevice for linking.
rsmpeg unconditionally links against libswresample, so it must
be linked at the final executable level too.
GNU ld processes libraries left-to-right. ccx_rust contains rsmpeg
which needs FFmpeg symbols (e.g. swr_get_out_samples), so FFmpeg
libs must appear after ccx_rust in the link order.
CMake deduplicates libraries in target_link_libraries, so adding
FFmpeg libs twice via EXTRA_LIBS had no effect. Using
target_link_options with -l flags avoids deduplication and ensures
they appear after ccx_rust in the GNU ld link order.
Move FFmpeg libraries from EXTRA_LIBS into a separate EXTRA_FFMPEG_LIBS
variable, appended after ccx_rust. This ensures GNU ld (which processes
left-to-right) sees ccx_rust's FFmpeg symbol references before the
shared libs that provide them (e.g. swr_get_out_samples from
libswresample). Previous approaches failed because CMake deduplicates
library entries and target_link_options places flags before libraries.
Make ccx's link dependencies PRIVATE so FFmpeg libs don't propagate
as transitive dependencies to ccextractor. CMake was deduplicating
the EXTRA_FFMPEG_LIBS (placed after ccx_rust) against identical
entries propagated from ccx (placed before ccx_rust), keeping the
earlier occurrence and breaking GNU ld's left-to-right resolution.
Corrosion places ccx_rust at the absolute end of the link line.
On Linux, ccx_rust (containing rsmpeg) needs FFmpeg symbols like
swr_get_out_samples. By adding FFmpeg libs to ccx_rust's
INTERFACE_LINK_LIBRARIES, CMake places them right after ccx_rust,
ensuring correct GNU ld left-to-right symbol resolution.

Also make ccx's link deps PRIVATE to prevent transitive propagation
that could cause deduplication issues.
GNU ld only pulls object files from static libraries when they
resolve currently-needed symbols. Bridge functions in libccx.a
(ccx_mp4_process_avc_sample etc.) are not needed until libccx_rust.a
is processed, but libccx.a comes first. Using --undefined forces
the linker to pull these symbols early, same pattern used for
decode_vbi/do_cb/store_hdcc.
@gaurav02081
Copy link
Contributor Author

FFmpeg-based implementation written in Rust using rsmpeg, activated via -DWITH_FFMPEG=ON. The GPAC path
remains the default and is completely untouched when FFmpeg is not enabled.

Architecture

The implementation follows the same 3-layer pattern as PR #2170:

= Layer 1 — Rust Core (src/rust/src/demuxer/mp4.rs)

  • Opens MP4 via rsmpeg::AVFormatContextInput
  • Classifies streams: AVC/H.264, HEVC/H.265, CEA-608, CEA-708, tx3g
  • Parses SPS/PPS/VPS from codec extradata
  • Main av_read_frame loop dispatches packets to C bridge functions

= Layer 2 — FFI Exports (src/rust/src/mp4_ffmpeg_exports.rs)

  • ccxr_processmp4() — replaces processmp4()
  • ccxr_dumpchapters() — replaces dumpchapters()

= Layer 3 — C Bridge (src/lib_ccx/mp4_rust_bridge.c)

  • Flat FFI-safe wrappers around existing C processing functions
  • ccx_mp4_process_avc_sample() → NAL parsing via do_NAL()
  • ccx_mp4_process_hevc_sample() → NAL parsing + store_hdcc() flush
  • ccx_mp4_process_cc_packet() → CEA-608 via process608(), CEA-708 via ccdp_find_data() + ccxr_dtvcc_process_data()
  • ccx_mp4_process_tx3g_packet() → 3GPP timed text
  • No GPAC structs cross the FFI boundary — only primitive types

Build System

cmake -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src

  • ENABLE_FFMPEG_MP4 compile flag gates all new code
  • Cargo feature enable_mp4_ffmpeg pulls in rsmpeg
  • Corrosion passes the feature to Cargo when WITH_FFMPEG is on
  • FFmpeg libs are added as INTERFACE_LINK_LIBRARIES of ccx_rust to ensure correct GNU ld link order
  • Bridge functions use --undefined linker flags (same pattern as existing decode_vbi/do_cb/store_hdcc)

Files

New (5):

src/rust/src/demuxer/mp4.rs         Rust FFmpeg demuxer core      
 src/rust/src/mp4_ffmpeg_exports.rs  C-callable FFI exports        
 src/lib_ccx/mp4_rust_bridge.c       Thin C bridge                 
 src/lib_ccx/mp4_rust_bridge.h       Bridge header                 
 src/lib_ccx/ccx_gpac_types.h        Minimal GPAC-compatible types 

Modified (13):```
Cargo.toml, build.rs, lib.rs, demuxer/mod.rs, wrapper.h, rust/CMakeLists.txt, src/CMakeLists.txt, lib_ccx/CMakeLists.txt, ccx_mp4.h,
ccextractor.c, build_linux.yml, build_mac.yml


  Testing

  - GPAC-only build (cmake ../src) — compiles and links, no regression
  - FFmpeg build (cmake -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON ../src) — compiles and links
  - Runtime: ./ccextractor tests/samples/BBC1.mp4 produces identical output to GPAC path
  - CI: Linux and macOS cmake_ffmpeg_mp4 jobs added

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 5c87a33...:
Report Name Tests Passed
Broken 10/13
CEA-708 2/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 79/86
Teletext 20/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 5c87a33...:
Report Name Tests Passed
Broken 10/13
CEA-708 2/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 27/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 81/86
Teletext 20/21
WTV 13/13
XDS 34/34

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

This PR does not introduce any new test failures. However, some tests are failing on both master and this PR (see above).

Check the result page for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants