azure tts: add streaming synthesis via Speech SDK WebSocket V2 by abhishekranjan-bluemachines · Pull Request #5126 · livekit/agents

abhishekranjan-bluemachines · 2026-03-17T09:08:28Z

Summary

Add SynthesizeStream class to the Azure TTS plugin that uses the Azure Speech SDK's text streaming API (WebSocket V2 endpoint) to feed LLM tokens incrementally into the synthesizer
Reduces time-to-first-audio-byte from ~550-1350ms (REST API, full sentence buffering) to ~100-200ms (streaming, progressive token feeding)
Default behavior remains unchanged (use_streaming=False); opt-in via use_streaming=True

Details

New stream() method returning SynthesizeStream — uses SpeechSynthesisRequest with TextStream input type on wss://{region}.tts.speech.microsoft.com/cognitiveservices/websocket/v2
Two-task pipeline following the same pattern as the Google TTS plugin: sentence tokenization → sequential segment synthesis
Bridges Azure SDK callbacks (background thread) to asyncio via loop.call_soon_threadsafe() + asyncio.Queue
Pre-connects WebSocket per segment for lower TTFB
Warns if SSML features (prosody, style, lexicon) are configured with streaming enabled, since text streaming does not support SSML
Existing synthesize() / ChunkedStream REST API path fully preserved

Test plan

Verify default use_streaming=False behavior is unchanged (REST API path)
Test use_streaming=True with speech_key + speech_region auth
Test use_streaming=True with speech_auth_token + speech_region auth
Verify SSML warning is logged when streaming + prosody/style/lexicon configured
Measure TTFB improvement in a cascaded STT→LLM→TTS voice agent pipeline
Test cancellation and error handling (invalid key, network timeout)

Add SynthesizeStream class that uses Azure Speech SDK's text streaming API (WebSocket V2) to feed LLM tokens incrementally into the synthesizer, reducing time-to-first-audio-byte from ~550-1350ms to ~100-200ms. - New stream() method using SpeechSynthesisRequest with TextStream input - Two-task pipeline: sentence tokenization + sequential segment synthesis - SDK callback-to-asyncio bridge via call_soon_threadsafe + asyncio.Queue - Pre-connects WebSocket per segment for lower TTFB - Default remains use_streaming=False (REST API) for backward compatibility - Opt-in with use_streaming=True; SSML features not supported in streaming mode

CLAassistant · 2026-03-17T09:08:41Z

All committers have signed the CLA.

- Run _feed_text and _consume_audio as concurrent tasks so audio chunks are pushed to the emitter as they arrive from the SDK, not after all text has been fed - Fix get_ws_endpoint_url to prioritize speech_endpoint over region, matching the existing get_endpoint_url behavior

Overrides chat() to inject max_output_tokens into extra_kwargs, allowing token limit control for Azure OpenAI Responses API calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

abhishekranjan-bluemachines · 2026-03-20T05:13:39Z

CI re-run: blockguard-tests (macos-latest) failed on a flaky timing-sensitive test (test_many_short_blocks) unrelated to this PR. Pushing this comment to retrigger CI.

This comment was marked as resolved.

Sign in to view

abhishekranjan-bluemachines and others added 7 commits March 17, 2026 15:13

fix: add type: ignore for untyped azure SDK import (matches stt.py)

b745417

feat: add max_output_tokens support to Azure Responses LLM

a05bcd9

Overrides chat() to inject max_output_tokens into extra_kwargs, allowing token limit control for Azure OpenAI Responses API calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: correct chat() return type to match OpenAI responses LLM supertype

c3b193e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

revert: remove max_output_tokens from Azure Responses LLM

179f89a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add max_completion_tokens support to openai LLM with_azure()

d335605

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: allow endpoint-only auth in streaming TTS speech config

6d22f13

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: fix ruff formatting for line length

0640c20

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ci: retrigger CI (flaky blockguard macos test)

4fc28a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

azure tts: add streaming synthesis via Speech SDK WebSocket V2#5126

azure tts: add streaming synthesis via Speech SDK WebSocket V2#5126
abhishekranjan-bluemachines wants to merge 10 commits intolivekit:mainfrom
abhishekranjan-bluemachines:feat/azure-tts-streaming

abhishekranjan-bluemachines commented Mar 17, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 17, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

abhishekranjan-bluemachines commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

abhishekranjan-bluemachines commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test plan

Uh oh!

CLAassistant commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

abhishekranjan-bluemachines commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

abhishekranjan-bluemachines commented Mar 17, 2026 •

edited

Loading

CLAassistant commented Mar 17, 2026 •

edited

Loading