azure tts: add streaming synthesis via Speech SDK WebSocket V2#5126
Open
abhishekranjan-bluemachines wants to merge 10 commits intolivekit:mainfrom
Open
azure tts: add streaming synthesis via Speech SDK WebSocket V2#5126abhishekranjan-bluemachines wants to merge 10 commits intolivekit:mainfrom
abhishekranjan-bluemachines wants to merge 10 commits intolivekit:mainfrom
Conversation
Add SynthesizeStream class that uses Azure Speech SDK's text streaming API (WebSocket V2) to feed LLM tokens incrementally into the synthesizer, reducing time-to-first-audio-byte from ~550-1350ms to ~100-200ms. - New stream() method using SpeechSynthesisRequest with TextStream input - Two-task pipeline: sentence tokenization + sequential segment synthesis - SDK callback-to-asyncio bridge via call_soon_threadsafe + asyncio.Queue - Pre-connects WebSocket per segment for lower TTFB - Default remains use_streaming=False (REST API) for backward compatibility - Opt-in with use_streaming=True; SSML features not supported in streaming mode
- Run _feed_text and _consume_audio as concurrent tasks so audio chunks are pushed to the emitter as they arrive from the SDK, not after all text has been fed - Fix get_ws_endpoint_url to prioritize speech_endpoint over region, matching the existing get_endpoint_url behavior
Overrides chat() to inject max_output_tokens into extra_kwargs, allowing token limit control for Azure OpenAI Responses API calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
CI re-run: blockguard-tests (macos-latest) failed on a flaky timing-sensitive test ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SynthesizeStreamclass to the Azure TTS plugin that uses the Azure Speech SDK's text streaming API (WebSocket V2 endpoint) to feed LLM tokens incrementally into the synthesizeruse_streaming=False); opt-in viause_streaming=TrueDetails
stream()method returningSynthesizeStream— usesSpeechSynthesisRequestwithTextStreaminput type onwss://{region}.tts.speech.microsoft.com/cognitiveservices/websocket/v2loop.call_soon_threadsafe()+asyncio.Queuesynthesize()/ChunkedStreamREST API path fully preservedTest plan
use_streaming=Falsebehavior is unchanged (REST API path)use_streaming=Truewithspeech_key+speech_regionauthuse_streaming=Truewithspeech_auth_token+speech_regionauth