Multiple calls to create_chat_completion() fail with "llama_decode: failed to decode, ret = -1"

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior
I am running LiquidAI's LFM2.5-1.2B-Instruct model. Calling `create_chat_completion()` multiple times should not throw error.

# Current Behavior
When trying to call `create_chat_completion` twice in a row, the model throws error "llama_decode: failed to decode, ret = -1"


# Environment and Context

I am using llama-cpp-python v0.3.16. Python version is 3.13.9.
* Windows 11 

# Failure Information (for bugs)

On tracing back the issue, it looks like there needs to be a context and cache reset after each chat_completion call, which isn't happening yet.

# Steps to Reproduce
```python

from pathlib import Path
from llama_cpp import Llama
llm = Llama(
    model_path=str(Path.home() / "AppData/Local/llama.cpp/LiquidAI_LFM2.5-1.2B-Instruct-GGUF_LFM2.5-1.2B-Instruct-Q4_K_M.gguf"),
    n_ctx=1000
)

system_prompt = """
\nYou are a helpful assistant
"""
prompt = """
suggest me places to visit during winter season
"""
response = llm.create_chat_completion(
      messages =  [{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
)
print(response)
# llm.reset()                                               # Using this works
# llm._ctx.kv_cache_clear()                        # Using this works
response = llm.create_chat_completion(
      messages =  [{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
)
print(response)

```

# Failure Logs
```
init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
 - the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 519
 - the tokens for sequence 0 in the input batch have a starting position of Y = 29
 it is required that the sequence positions remain consecutive: Y = X + 1
decode: failed to initialize batch
llama_decode: failed to decode, ret = -1
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple calls to create_chat_completion() fail with "llama_decode: failed to decode, ret = -1" #2140

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multiple calls to create_chat_completion() fail with "llama_decode: failed to decode, ret = -1" #2140

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions