Improve gpt-realtime token counting/Fix cost tracking via OTEL spans by bml1g12 · Pull Request #5121 · livekit/agents

bml1g12 · 2026-03-16T16:49:05Z

Context: I realised that our gpt-realtime cost estimates shown in LangFuse (based on token counts) were almost double what we were being billed by OpenAI and investigated why

Cause: The cause was that OpenAI reports total token count inclusive of cached tokens. LangFuse can only calculate costs by multiplying a span attribute by a $ price, which means currently it is impossible to calculate the correct price as for that we would need the uncached token count as a raw attribute. This is a known limitation currently: langfuse/langfuse#10592

Proposed Solution in this PR: We should specify an attribute which is the uncached token count which can optionally be used in cost calculation post processsing.

The problem is that prior to this PR input_text_tokens attribute was inclusive of the cached token count, meaning it got double counted in the eventual cost calculation. So this PR makes two changes:

Fix: Set input_text_tokens and input_audio_tokens to mean the uncached portion of text tokens, which is how LangFuse actually interprets it
Improvement: Split input_cached_tokens into input_cached_text_tokens and input_cached_audio_tokens as gpt-realtime distinguishes these in their metrics (albeit currently priced the same)

Example of usage

LangFuse themselves provide model<>cost definition like the below screenshot:

Or users can make their own, but after this PR, the input_text_tokens correctly represents the input uncached tokens, whereas before this PR it incorrectly included cached tokens.

Note: I focus on LangFuse here, but this PR would likely be useful for any provider that does cost calculations via a single operation: taking token counts and multipying it by the cost, so I think its generally useful for even non-langfuse users.

… estimates

…ribute name to clarify its text cached token gen_ai.usage.input_cached_tokens --> gen_ai.usage.input_text_cached_tokens so its distinct from audio cached tokens

LangFuse use e.g. input_cached_audio_tokens as the name in their maintained model price definitions

…his PR

bml1g12 added 5 commits March 16, 2026 16:48

feat(telemetry): add uncached token counting for use in LangFuse cost…

b0dc704

… estimates

fix(telemetry): gpt-realtime emit correct token counts for costing

cd26675

docs(telemetry): clarify token counting logic

f8e470e

chore(livekit/agents/voice/run_result): update judge cached token att…

aee1969

…ribute name to clarify its text cached token gen_ai.usage.input_cached_tokens --> gen_ai.usage.input_text_cached_tokens so its distinct from audio cached tokens

fix(trace-types): align with langfuse maintained naming convention

1cd57a6

LangFuse use e.g. input_cached_audio_tokens as the name in their maintained model price definitions

bml1g12 marked this pull request as ready for review March 17, 2026 17:20

bml1g12 changed the title ~~feat(telemetry): add uncached token counting for use in LangFuse cost…~~ feat(telemetry): Improve gpt-realtime token counting/cost tracking via OTEL spans Mar 17, 2026

This comment was marked as resolved.

Sign in to view

bml1g12 marked this pull request as draft March 17, 2026 17:29

bml1g12 added 2 commits March 18, 2026 11:01

fix(utils): if no cache available, assume tokens are uncached token

434515f

doc(trace_types): clarify what each attribute used for

5cf5734

bml1g12 marked this pull request as ready for review March 18, 2026 11:04

refactor: formatting

25e49c1

This comment was marked as resolved.

Sign in to view

bml1g12 marked this pull request as draft March 18, 2026 11:08

fix(run_result): correct for uncached token count

ab0a9fb

bml1g12 marked this pull request as ready for review March 18, 2026 11:10

chore(trace_types): force a CI re-run as tests failing unrelated to t…

ecae36e

…his PR

bml1g12 changed the title ~~feat(telemetry): Improve gpt-realtime token counting/cost tracking via OTEL spans~~ Improve gpt-realtime token counting/Fix cost tracking via OTEL spans Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve gpt-realtime token counting/Fix cost tracking via OTEL spans #5121

Improve gpt-realtime token counting/Fix cost tracking via OTEL spans #5121
bml1g12 wants to merge 10 commits intolivekit:mainfrom
bml1g12:fix_gpt_realtime_counting

bml1g12 commented Mar 16, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bml1g12 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bml1g12 commented Mar 16, 2026 •

edited

Loading