-
Notifications
You must be signed in to change notification settings - Fork 3.3k
azure-ai-agentserver: Support background mode and resumable streaming for hosted agents #46015
Description
Feature Request
Support background: true and resumable streaming (GET /responses/{id}?stream=true&starting_after=N) for hosted agents, matching the behavior already available for direct model invocations via the Foundry Responses API.
Current Behavior
Direct model invocation (works)
When calling the Responses API directly with a model deployment:
POST /openai/responses
{
"model": "gpt-4o",
"stream": true,
"store": true,
"background": true,
"input": [{"role": "user", "content": "Hello"}]
}- Returns immediately with
status: "in_progress"and"background": true - Every SSE event includes a
sequence_number GET /responses/{id}?stream=true&starting_after=0replays all stored events, then continues with live events if still in progress- Response lifecycle completes correctly (
status: "completed",outputpopulated)
Hosted agent invocation (does not work)
When calling the same Responses API with an agent_reference pointing to a hosted agent:
POST /openai/responses
{
"model": "gpt-5",
"stream": true,
"store": true,
"background": true,
"agent_reference": {"type": "agent_reference", "name": "my-agent"},
"input": [{"role": "user", "content": "Hello"}]
}- Returns immediately with
status: "in_progress"but the stored response has"background": false— the flag is silently dropped GET /responses/{id}?stream=true&starting_after=0returns"Streaming is not enabled for this response"- The response object remains stuck at
status: "in_progress"with emptyoutput, even after the agent completes and conversation items are saved - The
azure-ai-agentserver-coreSDK (v1.0.0b16) has no reference tobackgroundanywhere in its source
Why This Matters
The primary use case is resumable/rejoinable streaming for chat UIs. When a user:
- Refreshes the page mid-generation
- Loses network connectivity temporarily
- Opens a conversation that is still being generated in another tab
They should be able to call GET /responses/{id}?stream=true&starting_after=0 to replay past events and continue receiving live events. This works today for direct model calls but not for hosted agents, despite both using the same /openai/responses API surface.
Observations
- The SDK already assigns
sequence_numberto everyResponseStreamEventviaStreamEventState— the primitive forstarting_afterresumption is already in place - The SDK already supports
store=truewhich saves completed items to the Conversations API after stream completion - The gap appears to be in Foundry's proxy layer between the Responses API endpoint and the hosted agent container — it doesn't buffer/store SSE events as they pass through for hosted agents the way it does for direct model calls
Expected Behavior
background: true + stream: true should work identically for hosted agents as it does for direct model calls:
- Foundry buffers SSE events as they pass through from the hosted agent container
GET /responses/{id}?stream=true&starting_after=Nreplays stored events and continues with live events- The response lifecycle completes correctly when the agent finishes
Environment
azure-ai-agentserver-core==1.0.0b16azure-ai-agentserver-langgraph==1.0.0b16- API version:
2025-11-15-preview