fix: synchronize ClientTaskManager to prevent race condition in concu…#742
fix: synchronize ClientTaskManager to prevent race condition in concu…#742kabir wants to merge 2 commits intoa2aproject:mainfrom
Conversation
…rrent SSE event processing ClientTaskManager was not thread-safe, causing intermittent test failures when SSE events were delivered concurrently by the HTTP client. The race condition occurred when multiple events (startWork, addArtifact, complete) were emitted rapidly in streaming mode. Without synchronization, concurrent threads could interleave read-modify-write operations on currentTask, resulting in lost updates. Specifically, the artifact update could be overwritten by a status update, leaving the final COMPLETED task with no artifacts. This manifested as intermittent failures in agent-to-agent tests where the test received a COMPLETED task but extractTextFromTask() returned an empty string because the artifact list was empty. Fix: Added synchronized keyword to all ClientTaskManager methods (getCurrentTask, saveTaskEvent variants, updateWithMessage) to ensure atomic processing of events regardless of HTTP client threading behavior. Impact: All client transports (JSON-RPC, gRPC, REST) in streaming mode Validated: 200 consecutive test iterations passed on CI Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical concurrency issue within the Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request modifies ClientTaskManager.java by adding the synchronized keyword to several public methods, including getCurrentTask, saveTaskEvent (multiple overloads), and updateWithMessage. This change ensures thread safety for task management operations, preventing potential race conditions when accessing or modifying shared task-related state from multiple threads.
| @@ -0,0 +1,69 @@ | |||
| name: Test Race Condition Fix (Temporary) | |||
There was a problem hiding this comment.
@kabir do you plan to keep it in the CI or remove it before merging?
There was a problem hiding this comment.
I thought I had removed it. Done now
…rrent SSE event processing
ClientTaskManager was not thread-safe, causing intermittent test failures when
SSE events were delivered concurrently by the HTTP client. The race condition
occurred when multiple events (startWork, addArtifact, complete) were emitted
rapidly in streaming mode.
Without synchronization, concurrent threads could interleave read-modify-write
operations on currentTask, resulting in lost updates. Specifically, the artifact
update could be overwritten by a status update, leaving the final COMPLETED task
with no artifacts. The fast time to failure of the test 0.017s, indicates that the
COMPLETED status update was received, and that the latch did not time out.
This manifested as intermittent failures in agent-to-agent tests where the test
received a COMPLETED task but extractTextFromTask() returned an empty string
because the artifact list was empty.
Fix: Added synchronized keyword to all ClientTaskManager methods (getCurrentTask,
saveTaskEvent variants, updateWithMessage) to ensure atomic processing of events
regardless of HTTP client threading behavior.
Impact: All client transports (JSON-RPC, gRPC, REST) in streaming mode
Validated: 100 consecutive test iterations passed on CI
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com