feat: Update vendored llama.cpp to include ModernBERT support

## Context

The current release (v0.3.16, Aug 15 2025) vendors llama.cpp at commit `4227c9b` (Aug 14, 2025). ModernBERT architecture support was added to llama.cpp on Dec 22, 2025 in PR [ggml-org/llama.cpp#15641](https://github.com/ggml-org/llama.cpp/pull/15641) ([`src/models/modern-bert.cpp`](https://github.com/ggml-org/llama.cpp/blob/d8c331c0afb1730a17c4440a924d0f57884245c0/src/models/modern-bert.cpp)). The vendor submodule hasn't been updated in ~7 months.

## Problem

Loading a ModernBERT GGUF model fails with:

```
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'modernbert'
```

The binary only knows: `bert, falcon, gemma, gpt2, llama, mamba, qwen, rwkv, stablelm` — no `modernbert`.

## Attempted fix

We tried updating the vendor submodule to llama.cpp b8373 (Mar 16, 2026) and building from source. Results:

1. **CMake fails** with `LLAVA_BUILD=ON` (default) — `vendor/llama.cpp/tools/mtmd/CMakeLists.txt:37` has a `set_target_properties` error incompatible with the binding's CMakeLists.txt
2. **CMake succeeds** with `-DLLAVA_BUILD=OFF` — builds and installs fine
3. **Runtime ABI mismatch** — the Python bindings call `llama_get_kv_self` which no longer exists in b8373 (`symbol not found`). The C API has diverged significantly in 7 months.

So it's not just a submodule bump — the Python ctypes bindings need updates to match the current llama.cpp C API.

## Why it matters

Several recent high-quality embedding models use ModernBERT architecture:
- **granite-embedding-small-english-r2** (IBM, 47M params, 384d, BEIR NDCG@10 50.9, Apache 2.0)
- **modernbert-embed-base** (Nomic, 149M params, Apache 2.0)
- **gte-modernbert-base** (Alibaba, 149M params)

These models can't be used with llama-cpp-python until the binding is updated.

## Request

Update the Python ctypes bindings and CMakeLists.txt to be compatible with a recent llama.cpp build that includes ModernBERT support (post Dec 22, 2025).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Update vendored llama.cpp to include ModernBERT support #2144

Context

Problem

Attempted fix

Why it matters

Request

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Update vendored llama.cpp to include ModernBERT support #2144

Description

Context

Problem

Attempted fix

Why it matters

Request

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions