-
Notifications
You must be signed in to change notification settings - Fork 0
feat: QA → Architect escalation path for architectural failures (#57) #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4ad329c
a7170fc
add08f1
5ad48e6
4c3b079
e6f5a3b
0e18164
094689f
bf22951
3c64ded
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,19 +1,41 @@ | ||
| """QA agent - runs tests, audits security, writes failure reports. | ||
|
|
||
| Produces structured failure reports for retry context. | ||
| Can escalate architectural failures back to Architect. | ||
|
|
||
| Implementation in Step 4. | ||
| Can escalate architectural failures back to Architect via | ||
| failure_type classification. | ||
| """ | ||
|
|
||
| from pydantic import BaseModel | ||
| import logging | ||
| from enum import Enum | ||
|
|
||
| from pydantic import BaseModel, field_validator, model_validator | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class FailureType(str, Enum): | ||
| """Classification of QA failure for routing decisions. | ||
|
|
||
| The orchestrator uses this to decide where to route after QA: | ||
| - CODE: Bug, syntax error, test failure in the implementation. | ||
| Action: retry with Lead Dev using the same Blueprint. | ||
| - ARCHITECTURAL: Wrong target file, missing dependency, design flaw. | ||
| Action: escalate to Architect for a new Blueprint. | ||
| """ | ||
|
|
||
| CODE = "code" | ||
| ARCHITECTURAL = "architectural" | ||
|
|
||
|
|
||
| class FailureReport(BaseModel): | ||
| """Structured report passed back on QA failure. | ||
|
|
||
| Included in retry context so Lead Dev knows exactly | ||
| what broke and why. | ||
| Included in retry context so the orchestrator can route to the | ||
| correct agent: Lead Dev for code fixes, Architect for re-planning. | ||
|
|
||
| The failure_type field is the primary classifier. The is_architectural | ||
| bool is kept for backward compatibility and stays in sync via the | ||
| model validator. | ||
| """ | ||
|
|
||
| task_id: str | ||
|
|
@@ -24,10 +46,54 @@ class FailureReport(BaseModel): | |
| failed_files: list[str] | ||
| is_architectural: bool # If True, escalate to Architect | ||
| recommendation: str | ||
| failure_type: FailureType | None = None | ||
|
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. Fixed in 4c3b079 -- added a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Defining Useful? React with 👍 / 👎.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — the |
||
|
|
||
| @field_validator("failure_type", mode="before") | ||
| @classmethod | ||
| def normalize_failure_type(cls, v: object) -> object: | ||
| """Accept case-insensitive failure_type values from LLM output. | ||
|
|
||
| LLMs may return "ARCHITECTURAL", "Code", or even typos like | ||
| "design_flaw". Normalize known values to lowercase; coerce | ||
| unknown strings to None so the model_validator can fall back | ||
| to is_architectural/status instead of crashing the workflow. | ||
| """ | ||
| if isinstance(v, str): | ||
| normalized = v.strip().lower() | ||
| if normalized in ("code", "architectural"): | ||
| return normalized | ||
| # Unknown value -- log and fall back to None | ||
| if normalized: | ||
| logger.warning( | ||
| "Unknown failure_type '%s' from LLM output, " | ||
| "falling back to is_architectural/status", | ||
| v, | ||
| ) | ||
| return None | ||
| return v | ||
|
|
||
| @model_validator(mode="after") | ||
| def sync_failure_type(self) -> "FailureReport": | ||
| """Keep failure_type and is_architectural in sync. | ||
|
|
||
| # TODO Step 4: | ||
| # - Define QA prompt template | ||
| # - Implement test execution via E2B sandbox | ||
| # - Parse structured output from sandbox wrapper | ||
| # - Determine pass/fail/escalate decision | ||
| If failure_type is provided, it takes precedence and syncs | ||
| is_architectural. If only is_architectural is set (backward | ||
| compat from older LLM output), failure_type is derived. | ||
| status="escalate" also implies ARCHITECTURAL even when | ||
| is_architectural was not explicitly set. | ||
| """ | ||
| if self.failure_type is not None: | ||
| # failure_type takes precedence | ||
| self.is_architectural = ( | ||
| self.failure_type == FailureType.ARCHITECTURAL | ||
| ) | ||
| elif self.is_architectural: | ||
| self.failure_type = FailureType.ARCHITECTURAL | ||
| elif self.status == "escalate": | ||
| # status=escalate implies architectural failure | ||
| self.failure_type = FailureType.ARCHITECTURAL | ||
| self.is_architectural = True | ||
| elif self.status == "fail": | ||
| self.failure_type = FailureType.CODE | ||
|
Comment on lines
+96
to
+97
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When Useful? React with 👍 / 👎.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed — added |
||
| # status == "pass" leaves failure_type as None (no failure) | ||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| return self | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| """Async event bus for real-time SSE streaming to the dashboard. | ||
|
|
||
| Issue #35: SSE Event System — Real-Time Task Streaming | ||
| Issue #35: SSE Event System -- Real-Time Task Streaming | ||
|
|
||
| The EventBus is a singleton that LangGraph nodes publish events to. | ||
| Connected SSE clients each get their own asyncio.Queue for fan-out. | ||
|
|
@@ -14,7 +14,7 @@ | |
| data={"agent": "dev", "status": "coding", "task_id": "auth-rls"}, | ||
| )) | ||
|
|
||
| Usage (subscribing — handled by the /stream endpoint): | ||
| Usage (subscribing -- handled by the /stream endpoint): | ||
| queue = event_bus.subscribe() | ||
| try: | ||
| while True: | ||
|
|
@@ -48,6 +48,7 @@ class EventType(str, Enum): | |
| TASK_COMPLETE = "task_complete" | ||
| MEMORY_ADDED = "memory_added" | ||
| LOG_LINE = "log_line" | ||
| QA_ESCALATION = "qa_escalation" | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Verify producer + consumer wiring for qa_escalation
echo "== Backend event usage =="
rg -n -C3 "QA_ESCALATION|qa_escalation|EventType\." dev-suite/src
echo
echo "== Dashboard SSE type union =="
rg -n -C4 "type SSEEventType|qa_escalation" dashboard/src/lib/types/api.ts
echo
echo "== Dashboard SSE dispatch/listener coverage =="
rg -n -C4 "switch \(eventType\)|eventTypes: SSEEventType\[]|qa_escalation" dashboard/src/lib/sse.tsRepository: Abernaughty/agent-dev Length of output: 6639
The enum was added at Line 51, but the integration is incomplete: no backend code emits this event, the dashboard SSE type union ( 🤖 Prompt for AI Agents
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Acknowledged — this is an intentional forward declaration. The orchestrator currently doesn't emit any SSE events during execution (that wiring lives in the runner/API layer). Adding the enum value now means the dashboard type union and handlers can be wired in a follow-up without touching the events module again. Full emit→dispatch→UI chain will be connected when we build the orchestrator-to-SSE bridge. |
||
|
|
||
|
|
||
| class SSEEvent(BaseModel): | ||
|
|
@@ -133,7 +134,7 @@ async def publish(self, event: SSEEvent) -> int: | |
| delivered += 1 | ||
| except asyncio.QueueFull: | ||
| logger.warning( | ||
| "SSE client queue full — dropping event %s (counter=%d)", | ||
| "SSE client queue full -- dropping event %s (counter=%d)", | ||
| event.type.value, | ||
| self._event_counter, | ||
| ) | ||
|
|
@@ -156,6 +157,6 @@ async def clear(self) -> None: | |
| logger.info("EventBus cleared all subscribers") | ||
|
|
||
|
|
||
| # ── Singleton ── | ||
| # -- Singleton -- | ||
|
|
||
| event_bus = EventBus() | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new
failure_typefield is validated as a strictFailureTypeenum, so common LLM variants like"ARCHITECTURAL"or"Code"will raise aValidationErroreven when the rest of the QA report is valid. Inqa_node, that exception path is treated asQA failed to produce valid reportand sets workflow status toFAILED, which aborts normal retry/escalation routing. Because this value is model-generated text, it should be normalized (or otherwise parsed leniently) before enum validation.Useful? React with 👍 / 👎.