-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
I am running LiquidAI's LFM2.5-1.2B-Instruct model. Calling create_chat_completion() multiple times should not throw error.
Current Behavior
When trying to call create_chat_completion twice in a row, the model throws error "llama_decode: failed to decode, ret = -1"
Environment and Context
I am using llama-cpp-python v0.3.16. Python version is 3.13.9.
- Windows 11
Failure Information (for bugs)
On tracing back the issue, it looks like there needs to be a context and cache reset after each chat_completion call, which isn't happening yet.
Steps to Reproduce
from pathlib import Path
from llama_cpp import Llama
llm = Llama(
model_path=str(Path.home() / "AppData/Local/llama.cpp/LiquidAI_LFM2.5-1.2B-Instruct-GGUF_LFM2.5-1.2B-Instruct-Q4_K_M.gguf"),
n_ctx=1000
)
system_prompt = """
\nYou are a helpful assistant
"""
prompt = """
suggest me places to visit during winter season
"""
response = llm.create_chat_completion(
messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
)
print(response)
# llm.reset() # Using this works
# llm._ctx.kv_cache_clear() # Using this works
response = llm.create_chat_completion(
messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
)
print(response)Failure Logs
init: the tokens of sequence 0 in the input batch have inconsistent sequence positions:
- the last position stored in the memory module of the context (i.e. the KV cache) for sequence 0 is X = 519
- the tokens for sequence 0 in the input batch have a starting position of Y = 29
it is required that the sequence positions remain consecutive: Y = X + 1
decode: failed to initialize batch
llama_decode: failed to decode, ret = -1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels