Skip to content

azure tts: add streaming synthesis via Speech SDK WebSocket V2#5126

Open
abhishekranjan-bluemachines wants to merge 10 commits intolivekit:mainfrom
abhishekranjan-bluemachines:feat/azure-tts-streaming
Open

azure tts: add streaming synthesis via Speech SDK WebSocket V2#5126
abhishekranjan-bluemachines wants to merge 10 commits intolivekit:mainfrom
abhishekranjan-bluemachines:feat/azure-tts-streaming

Conversation

@abhishekranjan-bluemachines
Copy link
Contributor

@abhishekranjan-bluemachines abhishekranjan-bluemachines commented Mar 17, 2026

Summary

  • Add SynthesizeStream class to the Azure TTS plugin that uses the Azure Speech SDK's text streaming API (WebSocket V2 endpoint) to feed LLM tokens incrementally into the synthesizer
  • Reduces time-to-first-audio-byte from ~550-1350ms (REST API, full sentence buffering) to ~100-200ms (streaming, progressive token feeding)
  • Default behavior remains unchanged (use_streaming=False); opt-in via use_streaming=True

Details

  • New stream() method returning SynthesizeStream — uses SpeechSynthesisRequest with TextStream input type on wss://{region}.tts.speech.microsoft.com/cognitiveservices/websocket/v2
  • Two-task pipeline following the same pattern as the Google TTS plugin: sentence tokenization → sequential segment synthesis
  • Bridges Azure SDK callbacks (background thread) to asyncio via loop.call_soon_threadsafe() + asyncio.Queue
  • Pre-connects WebSocket per segment for lower TTFB
  • Warns if SSML features (prosody, style, lexicon) are configured with streaming enabled, since text streaming does not support SSML
  • Existing synthesize() / ChunkedStream REST API path fully preserved

Test plan

  • Verify default use_streaming=False behavior is unchanged (REST API path)
  • Test use_streaming=True with speech_key + speech_region auth
  • Test use_streaming=True with speech_auth_token + speech_region auth
  • Verify SSML warning is logged when streaming + prosody/style/lexicon configured
  • Measure TTFB improvement in a cascaded STT→LLM→TTS voice agent pipeline
  • Test cancellation and error handling (invalid key, network timeout)

Add SynthesizeStream class that uses Azure Speech SDK's text streaming
API (WebSocket V2) to feed LLM tokens incrementally into the synthesizer,
reducing time-to-first-audio-byte from ~550-1350ms to ~100-200ms.

- New stream() method using SpeechSynthesisRequest with TextStream input
- Two-task pipeline: sentence tokenization + sequential segment synthesis
- SDK callback-to-asyncio bridge via call_soon_threadsafe + asyncio.Queue
- Pre-connects WebSocket per segment for lower TTFB
- Default remains use_streaming=False (REST API) for backward compatibility
- Opt-in with use_streaming=True; SSML features not supported in streaming mode
@CLAassistant
Copy link

CLAassistant commented Mar 17, 2026

CLA assistant check
All committers have signed the CLA.

devin-ai-integration[bot]

This comment was marked as resolved.

- Run _feed_text and _consume_audio as concurrent tasks so audio chunks
  are pushed to the emitter as they arrive from the SDK, not after all
  text has been fed
- Fix get_ws_endpoint_url to prioritize speech_endpoint over region,
  matching the existing get_endpoint_url behavior
devin-ai-integration[bot]

This comment was marked as resolved.

Overrides chat() to inject max_output_tokens into extra_kwargs,
allowing token limit control for Azure OpenAI Responses API calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abhishekranjan-bluemachines
Copy link
Contributor Author

CI re-run: blockguard-tests (macos-latest) failed on a flaky timing-sensitive test (test_many_short_blocks) unrelated to this PR. Pushing this comment to retrigger CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants