-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Context
The current release (v0.3.16, Aug 15 2025) vendors llama.cpp at commit 4227c9b (Aug 14, 2025). ModernBERT architecture support was added to llama.cpp on Dec 22, 2025 in PR ggml-org/llama.cpp#15641 (src/models/modern-bert.cpp). The vendor submodule hasn't been updated in ~7 months.
Problem
Loading a ModernBERT GGUF model fails with:
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'modernbert'
The binary only knows: bert, falcon, gemma, gpt2, llama, mamba, qwen, rwkv, stablelm — no modernbert.
Attempted fix
We tried updating the vendor submodule to llama.cpp b8373 (Mar 16, 2026) and building from source. Results:
- CMake fails with
LLAVA_BUILD=ON(default) —vendor/llama.cpp/tools/mtmd/CMakeLists.txt:37has aset_target_propertieserror incompatible with the binding's CMakeLists.txt - CMake succeeds with
-DLLAVA_BUILD=OFF— builds and installs fine - Runtime ABI mismatch — the Python bindings call
llama_get_kv_selfwhich no longer exists in b8373 (symbol not found). The C API has diverged significantly in 7 months.
So it's not just a submodule bump — the Python ctypes bindings need updates to match the current llama.cpp C API.
Why it matters
Several recent high-quality embedding models use ModernBERT architecture:
- granite-embedding-small-english-r2 (IBM, 47M params, 384d, BEIR NDCG@10 50.9, Apache 2.0)
- modernbert-embed-base (Nomic, 149M params, Apache 2.0)
- gte-modernbert-base (Alibaba, 149M params)
These models can't be used with llama-cpp-python until the binding is updated.
Request
Update the Python ctypes bindings and CMakeLists.txt to be compatible with a recent llama.cpp build that includes ModernBERT support (post Dec 22, 2025).