feat: expose attention_type parameter in Llama.__init__ by jamesbiederbeck · Pull Request #2143 · abetlen/llama-cpp-python

jamesbiederbeck · 2026-03-14T18:28:18Z

llama_context_params already contains an attention_type field and
llama_cpp.py defines the LLAMA_ATTENTION_TYPE_* constants, but
Llama.__init__ does not expose this parameter.

This makes it impossible to select non-causal attention from Python,
which is required for embedding models trained with bidirectional
attention (e.g. GTE/Qwen embedding models).

This PR wires the parameter through to self.context_params.attention_type,
mirroring how pooling_type is handled.

Example usage:

from llama_cpp.llama_cpp import LLAMA_ATTENTION_TYPE_NON_CAUSAL

model = Llama(
model_path="model.gguf",
embedding=True,
attention_type=LLAMA_ATTENTION_TYPE_NON_CAUSAL,
)

feat: expose attention_type parameter in Llama.__init__

0ec3c4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose attention_type parameter in Llama.init#2143

feat: expose attention_type parameter in Llama.init#2143
jamesbiederbeck wants to merge 1 commit intoabetlen:mainfrom
jamesbiederbeck:expose-attention-type

jamesbiederbeck commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamesbiederbeck commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant