Skip to content

Improve gpt-realtime token counting/Fix cost tracking via OTEL spans #5121

Open
bml1g12 wants to merge 10 commits intolivekit:mainfrom
bml1g12:fix_gpt_realtime_counting
Open

Improve gpt-realtime token counting/Fix cost tracking via OTEL spans #5121
bml1g12 wants to merge 10 commits intolivekit:mainfrom
bml1g12:fix_gpt_realtime_counting

Conversation

@bml1g12
Copy link
Contributor

@bml1g12 bml1g12 commented Mar 16, 2026

Context: I realised that our gpt-realtime cost estimates shown in LangFuse (based on token counts) were almost double what we were being billed by OpenAI and investigated why

Cause: The cause was that OpenAI reports total token count inclusive of cached tokens. LangFuse can only calculate costs by multiplying a span attribute by a $ price, which means currently it is impossible to calculate the correct price as for that we would need the uncached token count as a raw attribute. This is a known limitation currently: langfuse/langfuse#10592

Proposed Solution in this PR: We should specify an attribute which is the uncached token count which can optionally be used in cost calculation post processsing.

The problem is that prior to this PR input_text_tokens attribute was inclusive of the cached token count, meaning it got double counted in the eventual cost calculation. So this PR makes two changes:

  • Fix: Set input_text_tokens and input_audio_tokens to mean the uncached portion of text tokens, which is how LangFuse actually interprets it
  • Improvement: Split input_cached_tokens into input_cached_text_tokens and input_cached_audio_tokens as gpt-realtime distinguishes these in their metrics (albeit currently priced the same)

Example of usage

LangFuse themselves provide model<>cost definition like the below screenshot:

image

Or users can make their own, but after this PR, the input_text_tokens correctly represents the input uncached tokens, whereas before this PR it incorrectly included cached tokens.

Note: I focus on LangFuse here, but this PR would likely be useful for any provider that does cost calculations via a single operation: taking token counts and multipying it by the cost, so I think its generally useful for even non-langfuse users.

bml1g12 added 5 commits March 16, 2026 16:48
…ribute name to clarify its text cached token

gen_ai.usage.input_cached_tokens --> gen_ai.usage.input_text_cached_tokens so its distinct from audio cached tokens
LangFuse use e.g. input_cached_audio_tokens  as the name in their maintained model price definitions
@bml1g12 bml1g12 marked this pull request as ready for review March 17, 2026 17:20
@bml1g12 bml1g12 changed the title feat(telemetry): add uncached token counting for use in LangFuse cost… feat(telemetry): Improve gpt-realtime token counting/cost tracking via OTEL spans Mar 17, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

@bml1g12 bml1g12 marked this pull request as draft March 17, 2026 17:29
@bml1g12 bml1g12 marked this pull request as ready for review March 18, 2026 11:04
devin-ai-integration[bot]

This comment was marked as resolved.

@bml1g12 bml1g12 marked this pull request as draft March 18, 2026 11:08
@bml1g12 bml1g12 marked this pull request as ready for review March 18, 2026 11:10
@bml1g12 bml1g12 changed the title feat(telemetry): Improve gpt-realtime token counting/cost tracking via OTEL spans Improve gpt-realtime token counting/Fix cost tracking via OTEL spans Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant