feat: QA → Architect escalation path for architectural failures (#57) by Abernaughty · Pull Request #78 · Abernaughty/agent-dev

Abernaughty · 2026-03-31T19:24:03Z

Summary

Implements the QA to Architect escalation path so the orchestrator can distinguish between code failures (retry Lead Dev) and architectural failures (re-plan with Architect). This makes the retry budget meaningful instead of burning all 3 retries on the same broken Blueprint.

Key Finding

The escalation routing was already 90% wired from the initial architecture. The route_after_qa function already handled ESCALATED -> architect, and the QA prompt already asked for is_architectural. This PR completes the feature with reliable classification, a new FailureType enum, enhanced prompts, and comprehensive test coverage.

Changes

`src/agents/qa.py` — FailureType enum + model enhancement

New FailureType enum: CODE vs ARCHITECTURAL for explicit routing decisions
New failure_type field on FailureReport (Optional, backward compatible)
model_validator keeps failure_type and is_architectural in sync
failure_type takes precedence when both are provided
Old LLM output without failure_type still works (derives from is_architectural)

`src/orchestrator.py` — Enhanced prompts (3 changes)

Import: FailureType now imported alongside FailureReport
QA system prompt: Added failure_type field to JSON schema + detailed classification heuristics (5 code-failure signals, 5 architectural-failure signals) so the LLM reliably distinguishes bugs from design flaws
Architect re-plan context: Includes failed_files list + explicit instruction to generate a COMPLETELY NEW Blueprint (not patch the old one)

`src/api/events.py` — QA_ESCALATION SSE event type

Dashboard can display "QA escalated to Architect" in the task timeline

`tests/test_qa_escalation.py` — 28 new tests

Category	Count	What it covers
FailureType enum	4	Values, string coercion, invalid raises
FailureReport model	9	Sync logic, precedence, backward compat, JSON round-trip
Routing logic	6	Code→dev, arch→architect, budget/retry limits
Graph structure	3	Nodes exist, conditional edges compile
QA classification	2	ESCALATED vs REVIEWING status
Architect re-plan	2	Failure context included, clean on first run
Retry budget	2	Escalation increments count, pass does not

Escalation Flow

QA classifies failure:
  failure_type = "code"          → retry Lead Dev (same Blueprint)
  failure_type = "architectural" → escalate to Architect (NEW Blueprint)
  
Budget limits still apply:
  retry_count >= MAX_RETRIES  → flush_memory → END
  tokens_used >= TOKEN_BUDGET → flush_memory → END

Acceptance Criteria (from #57)

QA agent's failure report includes a failure_type field: code or architectural
Orchestrator routes architectural failures back to the Architect node
Architect generates a new Blueprint (not a patch of the old one)
Architectural re-plans count against the retry budget (still max 3 total)
Unit tests covering both escalation paths
Existing tests unaffected (no regressions — FailureType is backward compatible)

Closes #57

Summary by CodeRabbit

New Features
- Automated QA escalation: failures now classify as code vs architectural and route architectural issues to the architecture team for full re-planning.
- New real-time event type to monitor QA escalations.
Behavior Improvements
- More resilient QA report handling from LLMs (case- and whitespace-tolerant classification, sensible fallbacks).
Tests
- Added comprehensive test coverage for escalation routing, classification rules, and retry/limits behavior.
Documentation
- Updated guidance describing escalation routing and classification.

- Add FailureType enum: CODE vs ARCHITECTURAL for routing - Add failure_type field to FailureReport (Optional, backward compat) - model_validator keeps failure_type and is_architectural in sync - failure_type takes precedence when both are provided - Old LLM output without failure_type still works (derives from is_architectural)

Dashboard can now display "QA escalated to Architect" in the task timeline when an architectural failure triggers re-planning.

…ion (#57) 1. Import FailureType from agents.qa 2. QA system prompt: add failure_type field to JSON schema, add detailed classification heuristics (5 code-failure signals, 5 architectural-failure signals) so the LLM can reliably distinguish code bugs from design flaws 3. Architect re-plan: include failed_files list in context, add explicit instruction to generate a COMPLETELY NEW Blueprint (not patch the old one)

Tests cover: - FailureType enum (4 tests) - FailureReport model sync logic (9 tests) - Routing: code vs architectural escalation (6 tests) - Graph structure verification (3 tests) - QA classification behavior (2 tests) - Architect re-plan context (2 tests) - Retry budget accounting (2 tests) Labels: testing, sprint/current, component/orchestrator

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ad48e6ce1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T19:29:20Z

dev-suite/src/agents/qa.py

    failed_files: list[str]
    is_architectural: bool  # If True, escalate to Architect
    recommendation: str
+    failure_type: FailureType | None = None


Accept case-insensitive failure_type values

The new failure_type field is validated as a strict FailureType enum, so common LLM variants like "ARCHITECTURAL" or "Code" will raise a ValidationError even when the rest of the QA report is valid. In qa_node, that exception path is treated as QA failed to produce valid report and sets workflow status to FAILED, which aborts normal retry/escalation routing. Because this value is model-generated text, it should be normalized (or otherwise parsed leniently) before enum validation.

Useful? React with 👍 / 👎.

Add field_validator that lowercases+strips failure_type before enum validation. LLMs may return "ARCHITECTURAL", "Code", "Architectural" etc. — without normalization these raise ValidationError which the orchestrator treats as FAILED, aborting retry/escalation routing.

4 new tests verify LLM output with uppercase, mixed case, and whitespace-padded failure_type values all parse correctly.

Abernaughty

Fixed in commits 4c3b079 and e6f5a3b. Added field_validator("failure_type", mode="before") that normalizes to lowercase+strip before enum validation. 4 new tests cover uppercase, mixed case, and whitespace-padded variants.

Abernaughty · 2026-03-31T19:34:07Z

dev-suite/src/agents/qa.py

    failed_files: list[str]
    is_architectural: bool  # If True, escalate to Architect
    recommendation: str
+    failure_type: FailureType | None = None


Good catch. Fixed in 4c3b079 -- added a field_validator("failure_type", mode="before") that calls .strip().lower() on string inputs before enum validation. Tests added for "ARCHITECTURAL", "Code", "CODE", and " architectural " variants.

Abernaughty · 2026-03-31T19:34:12Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6f5a3b22b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T19:39:05Z

dev-suite/src/agents/qa.py

    failed_files: list[str]
    is_architectural: bool  # If True, escalate to Architect
    recommendation: str
+    failure_type: FailureType | None = None


Gracefully handle unknown failure_type values

Defining failure_type as a strict FailureType enum means any non-exact LLM value (for example, a typo or synonym) now raises a validation error when FailureReport(**report_data) runs in qa_node, and that path marks the workflow as FAILED instead of continuing with retry/escalation logic. This turns small model-format drift into a hard stop; consider coercing unknown values to None (and falling back to is_architectural/status) rather than failing the whole run.

Useful? React with 👍 / 👎.

#2) Unknown LLM values like "design_flaw" or typos now gracefully fall back to None, letting the model_validator derive failure_type from is_architectural/status. Logs a warning for observability. Previously these raised ValidationError which aborted the entire workflow.

coderabbitai · 2026-03-31T19:41:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 280cc388-97c9-4874-a1e7-b5ebb6d8ac4f

📥 Commits

Reviewing files that changed from the base of the PR and between 094689f and 3c64ded.

📒 Files selected for processing (2)

dev-suite/src/agents/qa.py
dev-suite/tests/test_qa_escalation.py

🚧 Files skipped from review as they are similar to previous changes (2)

dev-suite/src/agents/qa.py
dev-suite/tests/test_qa_escalation.py

📝 Walkthrough

Walkthrough

Adds a QA→Architect escalation flow: a new FailureType enum and optional failure_type on FailureReport with validation/synchronization logic; orchestrator routing and prompts updated to route architectural failures to the Architect (new Blueprint) and code failures to the Lead Dev; new SSE event and tests added.

Changes

Cohort / File(s)	Summary
QA Agent Failure Classification `dev-suite/src/agents/qa.py`	Introduce `FailureType` enum (`code`, `architectural`) and add `failure_type: FailureType
Orchestrator Routing & Prompts `dev-suite/src/orchestrator.py`	Import `FailureType`. Update `qa_node` prompt schema to include `failure_type` and explicit classification rules; change `architect_node` to include failed files when present, append recommendation newline, and instruct Architect to generate a completely new Blueprint (not a patch).
API Event Type `dev-suite/src/api/events.py`	Add `QA_ESCALATION = "qa_escalation"` to `EventType` enum. Minor log/comment punctuation tweaks (em dash → double hyphen).
Test Coverage `dev-suite/tests/test_qa_escalation.py`	Add comprehensive tests for `FailureType` enum parsing, `FailureReport` sync rules, LLM-style string inputs, routing via `route_after_qa`, retry/budget edge cases, graph wiring (`qa`, `architect` nodes), and architect re-planning context behavior.

Sequence Diagram(s)

sequenceDiagram
    participant QA as QA Agent
    participant Router as Orchestrator
    participant Dev as Lead Dev Node
    participant Arch as Architect Node
    participant Mem as Memory/Flush

    QA->>Router: Return FailureReport (includes failure_type)
    alt failure_type == "code"
        Router->>Dev: Route to Lead Dev (same Blueprint)
        Dev->>Mem: Execute / retry / eventually escalate
    else failure_type == "architectural"
        Router->>Arch: Route to Architect (with failure context)
        Arch->>Arch: Generate a completely new Blueprint
        Arch->>Dev: Send new Blueprint to Lead Dev
    else status == "pass"
        Router->>Mem: Flush memory
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 A bug or a plan, which way shall we go?

QA now whispers the truth it must show.
Code goes to devs, designs go to art—
A rabbit hops in, glad to do its part!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main feature: QA → Architect escalation for architectural failures, which matches the primary objective of the changeset.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#57` are met: failure_type field added to FailureReport, orchestrator routes architectural failures to Architect, new Blueprint generation via enhanced prompts, architectural re-plans counted in retry budget, comprehensive tests added.
Out of Scope Changes check	✅ Passed	Changes are focused on QA escalation: FailureType enum, FailureReport field, orchestrator routing logic, prompts, and tests. QA_ESCALATION SSE event is a minor addition for dashboard visibility, directly supporting the escalation feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/57-qa-escalation-path

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

3 new tests verify that unknown LLM values ("design_flaw", "typo_value", "") gracefully fall back to is_architectural/status derivation instead of crashing. Total: 35 tests in test_qa_escalation.py.

Abernaughty

Fixed in commits 0e18164 and 094689f. The field_validator now coerces unknown strings to None (with a warning log) instead of letting them reach enum validation. The model_validator then derives failure_type from is_architectural/status as fallback. 3 new tests cover unknown values.

Abernaughty · 2026-03-31T19:41:50Z

dev-suite/src/agents/qa.py

    failed_files: list[str]
    is_architectural: bool  # If True, escalate to Architect
    recommendation: str
+    failure_type: FailureType | None = None


Fixed — the field_validator now only passes through "code" and "architectural" (after lowercasing). Unknown strings like "design_flaw" are coerced to None with a warning log, and the model_validator falls back to is_architectural/status. No more ValidationError crashes from model-format drift.

Abernaughty · 2026-03-31T19:41:55Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 094689ff0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T19:45:16Z

dev-suite/src/agents/qa.py

+        elif self.status == "fail":
+            self.failure_type = FailureType.CODE


Infer architectural failures from escalate status

When failure_type is missing/invalid, sync_failure_type only derives a fallback for status == "fail" and ignores status == "escalate". In that case a report like {status: "escalate", failure_type: "design_flaw", is_architectural: false} is normalized to a non-architectural failure, so qa_node routes it back to the developer instead of the architect. This defeats the new escalation path precisely when the model output is inconsistent (the case this fallback is meant to handle).

Useful? React with 👍 / 👎.

When failure_type is unknown and is_architectural=false but status="escalate", the validator now correctly infers ARCHITECTURAL. Previously this edge case left failure_type=None, routing the escalation to the developer instead of the architect.

Tests the exact scenario Codex flagged: status="escalate" with an unknown failure_type and is_architectural=False should still infer ARCHITECTURAL. Total: 36 tests.

Abernaughty

Fixed in bf22951 and 3c64ded. The sync_failure_type model_validator now handles status="escalate" as an explicit branch — when failure_type is None (unknown/missing) and is_architectural=False, status="escalate" correctly infers ARCHITECTURAL. Test added for the exact edge case.

Abernaughty · 2026-03-31T19:48:21Z

dev-suite/src/agents/qa.py

+        elif self.is_architectural:
+            self.failure_type = FailureType.ARCHITECTURAL
+        elif self.status == "fail":
+            self.failure_type = FailureType.CODE


Fixed — added elif self.status == "escalate" branch to sync_failure_type that sets failure_type = ARCHITECTURAL and is_architectural = True. The edge case {status: "escalate", failure_type: "design_flaw", is_architectural: false} now correctly routes to the Architect.

Abernaughty · 2026-03-31T19:48:35Z

@codex review

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

dev-suite/tests/test_qa_escalation.py (1)

220-228: Use orchestrator constants for retry/token boundary tests.

These assertions hard-code 3 and 50000, which can drift from env-configured MAX_RETRIES/TOKEN_BUDGET. Import the constants from src.orchestrator for stable intent-based tests.

Proposed test tweak

     def test_escalation_respects_max_retries(self):
-        route, WS = self._import_route()
-        state = {"status": WS.ESCALATED, "retry_count": 3, "tokens_used": 20000}
+        from src import orchestrator as orch
+        route, WS = self._import_route()
+        state = {"status": WS.ESCALATED, "retry_count": orch.MAX_RETRIES, "tokens_used": 20000}
         assert route(state) == "flush_memory"

     def test_escalation_respects_token_budget(self):
-        route, WS = self._import_route()
-        state = {"status": WS.ESCALATED, "retry_count": 1, "tokens_used": 50000}
+        from src import orchestrator as orch
+        route, WS = self._import_route()
+        state = {"status": WS.ESCALATED, "retry_count": 1, "tokens_used": orch.TOKEN_BUDGET}
         assert route(state) == "flush_memory"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_qa_escalation.py` around lines 220 - 228, Replace
hard-coded retry and token thresholds in tests
test_escalation_respects_max_retries and test_escalation_respects_token_budget
with the orchestrator constants: import MAX_RETRIES and TOKEN_BUDGET from
src.orchestrator at top of the test file, then use MAX_RETRIES for the
retry_count in test_escalation_respects_max_retries and TOKEN_BUDGET for
tokens_used in test_escalation_respects_token_budget so the assertions use the
same boundary values as the routing logic; keep the rest of the test setup
(route, WS = self._import_route(), state dict, assert route(state) ==
"flush_memory") unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/agents/qa.py`:
- Around line 88-92: The sync logic allows status="escalate" to be treated as
CODE when failure_type is unset and is_architectural is false; update the
sync_failure_type flow to explicitly map status == "escalate" to
FailureType.ARCHITECTURAL before the fallback that sets FailureType.CODE so
architectural intents don't degrade to code-retry. Concretely, inside the
sync_failure_type method (where it inspects self.is_architectural, self.status
and sets self.failure_type), add an explicit branch checking if self.status ==
"escalate" and assign FailureType.ARCHITECTURAL (referencing self.failure_type,
self.is_architectural, self.status, FailureType) prior to the existing elif that
assigns FailureType.CODE.

In `@dev-suite/src/api/events.py`:
- Line 51: QA_ESCALATION is added to events but never emitted or handled
end-to-end; to fix, emit QA_ESCALATION from the backend code path that triggers
QA escalations (call the same event-emitter used for other events with the
QA_ESCALATION constant), add 'qa_escalation' to the dashboard SSE type union in
the types file so the client recognizes the event, and register a handler for
'qa_escalation' in the SSE dispatcher and add it to the listener array so the UI
can receive and act on the event. Ensure the emitted payload shape matches the
dashboard's expected payload type.

In `@dev-suite/tests/test_qa_escalation.py`:
- Around line 270-310: Replace the local if/elif branching in
test_qa_sets_escalated_for_architectural and
test_qa_sets_reviewing_for_code_failure with calls to the actual qa_node
function: construct the same FailureReport instances, mock the QA LLM response
used by qa_node to simulate the intended QA outcome (architectural escalation vs
code failure), invoke qa_node(report, ...) and assert the returned
WorkflowStatus (or node result/status field) equals WorkflowStatus.ESCALATED for
the architectural case and WorkflowStatus.REVIEWING for the code-failure case;
refer to the qa_node function, FailureReport class, and WorkflowStatus enum to
locate inputs and outputs to mock and assert.
- Around line 321-340: Replace the manual assembly of user_msg in
test_architect_appends_failure_context with a real invocation of architect_node
using a stubbed LLM that captures sent messages; create a FailureReport instance
as shown, call architect_node(task, failure_report, llm_stub) (or the actual
architect_node signature) and then inspect the captured HumanMessage payload to
assert it includes failed_files (e.g., "wrong.py"), the errors and
recommendation, and the exact instruction text "Generate a COMPLETELY NEW
Blueprint"; ensure the test asserts on the captured HumanMessage content rather
than constructing user_msg manually.

---

Nitpick comments:
In `@dev-suite/tests/test_qa_escalation.py`:
- Around line 220-228: Replace hard-coded retry and token thresholds in tests
test_escalation_respects_max_retries and test_escalation_respects_token_budget
with the orchestrator constants: import MAX_RETRIES and TOKEN_BUDGET from
src.orchestrator at top of the test file, then use MAX_RETRIES for the
retry_count in test_escalation_respects_max_retries and TOKEN_BUDGET for
tokens_used in test_escalation_respects_token_budget so the assertions use the
same boundary values as the routing logic; keep the rest of the test setup
(route, WS = self._import_route(), state dict, assert route(state) ==
"flush_memory") unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 03e86091-118d-4936-8b36-98115e854ea4

📥 Commits

Reviewing files that changed from the base of the PR and between 1d4fecc and 094689f.

📒 Files selected for processing (4)

dev-suite/src/agents/qa.py
dev-suite/src/api/events.py
dev-suite/src/orchestrator.py
dev-suite/tests/test_qa_escalation.py

dev-suite/src/agents/qa.py

coderabbitai · 2026-03-31T19:50:28Z

dev-suite/src/api/events.py

    TASK_COMPLETE = "task_complete"
    MEMORY_ADDED = "memory_added"
    LOG_LINE = "log_line"
+    QA_ESCALATION = "qa_escalation"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify producer + consumer wiring for qa_escalation echo "== Backend event usage ==" rg -n -C3 "QA_ESCALATION|qa_escalation|EventType\." dev-suite/src echo echo "== Dashboard SSE type union ==" rg -n -C4 "type SSEEventType|qa_escalation" dashboard/src/lib/types/api.ts echo echo "== Dashboard SSE dispatch/listener coverage ==" rg -n -C4 "switch \(eventType\)|eventTypes: SSEEventType\[]|qa_escalation" dashboard/src/lib/sse.ts

Repository: Abernaughty/agent-dev

Length of output: 6639

QA_ESCALATION is defined but not wired end-to-end—dashboard visibility is broken.

The enum was added at Line 51, but the integration is incomplete: no backend code emits this event, the dashboard SSE type union (dashboard/src/lib/types/api.ts:172–176) does not include 'qa_escalation', and the SSE dispatcher and listener array (dashboard/src/lib/sse.ts:44–80, 138–144`) lack corresponding handlers. The event cannot flow to UI consumers.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/api/events.py` at line 51, QA_ESCALATION is added to events but never emitted or handled end-to-end; to fix, emit QA_ESCALATION from the backend code path that triggers QA escalations (call the same event-emitter used for other events with the QA_ESCALATION constant), add 'qa_escalation' to the dashboard SSE type union in the types file so the client recognizes the event, and register a handler for 'qa_escalation' in the SSE dispatcher and add it to the listener array so the UI can receive and act on the event. Ensure the emitted payload shape matches the dashboard's expected payload type.

coderabbitai · 2026-03-31T19:50:28Z

dev-suite/tests/test_qa_escalation.py

+    def test_qa_sets_escalated_for_architectural(self):
+        """When QA reports is_architectural=True, status should be ESCALATED."""
+        from src.orchestrator import WorkflowStatus
+
+        report = FailureReport(
+            task_id="t1", status="escalate",
+            tests_passed=0, tests_failed=2,
+            errors=["wrong file targeted"],
+            failed_files=["wrong.py"],
+            is_architectural=True,
+            recommendation="re-plan",
+        )
+        if report.is_architectural:
+            status = WorkflowStatus.ESCALATED
+        elif report.status == "pass":
+            status = WorkflowStatus.PASSED
+        else:
+            status = WorkflowStatus.REVIEWING
+
+        assert status == WorkflowStatus.ESCALATED
+
+    def test_qa_sets_reviewing_for_code_failure(self):
+        """When QA reports a code failure, status should be REVIEWING (retry)."""
+        from src.orchestrator import WorkflowStatus
+
+        report = FailureReport(
+            task_id="t2", status="fail",
+            tests_passed=3, tests_failed=1,
+            errors=["TypeError in line 42"],
+            failed_files=["auth.py"],
+            is_architectural=False,
+            recommendation="fix type error",
+        )
+        if report.is_architectural:
+            status = WorkflowStatus.ESCALATED
+        elif report.status == "pass":
+            status = WorkflowStatus.PASSED
+        else:
+            status = WorkflowStatus.REVIEWING
+
+        assert status == WorkflowStatus.REVIEWING


⚠️ Potential issue | 🟠 Major

These tests reimplement QA branching instead of testing qa_node.

The assertions are computed from local if/elif logic, so they can still pass even if src.orchestrator.qa_node regresses. Please invoke qa_node directly with a mocked QA LLM response and assert returned state/status.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dev-suite/tests/test_qa_escalation.py` around lines 270 - 310, Replace the local if/elif branching in test_qa_sets_escalated_for_architectural and test_qa_sets_reviewing_for_code_failure with calls to the actual qa_node function: construct the same FailureReport instances, mock the QA LLM response used by qa_node to simulate the intended QA outcome (architectural escalation vs code failure), invoke qa_node(report, ...) and assert the returned WorkflowStatus (or node result/status field) equals WorkflowStatus.ESCALATED for the architectural case and WorkflowStatus.REVIEWING for the code-failure case; refer to the qa_node function, FailureReport class, and WorkflowStatus enum to locate inputs and outputs to mock and assert.

coderabbitai · 2026-03-31T19:50:29Z

dev-suite/tests/test_qa_escalation.py

+    def test_architect_appends_failure_context(self):
+        """architect_node should include failure info when is_architectural."""
+        failure = FailureReport(
+            task_id="t1", status="escalate",
+            tests_passed=0, tests_failed=2,
+            errors=["wrong file targeted", "missing dependency"],
+            failed_files=["wrong.py"],
+            is_architectural=True,
+            recommendation="Target auth.py instead of wrong.py",
+        )
+
+        user_msg = "Create authentication middleware"
+        if failure and failure.is_architectural:
+            user_msg += "\n\nPREVIOUS ATTEMPT FAILED (architectural issue):\n"
+            user_msg += f"Errors: {', '.join(failure.errors)}\n"
+            user_msg += f"Recommendation: {failure.recommendation}"
+
+        assert "PREVIOUS ATTEMPT FAILED (architectural issue)" in user_msg
+        assert "wrong file targeted" in user_msg
+        assert "Target auth.py instead of wrong.py" in user_msg


⚠️ Potential issue | 🟠 Major

Architect re-plan context test is not validating the actual orchestrator behavior.

This block manually assembles user_msg and does not verify the newly added failed_files context or the “Generate a COMPLETELY NEW Blueprint” instruction from src/orchestrator.py Lines 274-280. It should call architect_node with a stubbed LLM and inspect the HumanMessage payload.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dev-suite/tests/test_qa_escalation.py` around lines 321 - 340, Replace the manual assembly of user_msg in test_architect_appends_failure_context with a real invocation of architect_node using a stubbed LLM that captures sent messages; create a FailureReport instance as shown, call architect_node(task, failure_report, llm_stub) (or the actual architect_node signature) and then inspect the captured HumanMessage payload to assert it includes failed_files (e.g., "wrong.py"), the errors and recommendation, and the exact instruction text "Generate a COMPLETELY NEW Blueprint"; ensure the test asserts on the captured HumanMessage content rather than constructing user_msg manually.

chatgpt-codex-connector · 2026-03-31T19:52:57Z

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Abernaughty

Addressing CodeRabbit findings — all valid observations, responses inline.

Abernaughty · 2026-03-31T19:55:54Z

dev-suite/src/api/events.py

    TASK_COMPLETE = "task_complete"
    MEMORY_ADDED = "memory_added"
    LOG_LINE = "log_line"
+    QA_ESCALATION = "qa_escalation"


Acknowledged — this is an intentional forward declaration. The orchestrator currently doesn't emit any SSE events during execution (that wiring lives in the runner/API layer). Adding the enum value now means the dashboard type union and handlers can be wired in a follow-up without touching the events module again. Full emit→dispatch→UI chain will be connected when we build the orchestrator-to-SSE bridge.

Abernaughty · 2026-03-31T19:55:54Z

dev-suite/tests/test_qa_escalation.py

+            task_id="t2", status="fail",
+            tests_passed=3, tests_failed=1,
+            errors=["TypeError in line 42"],
+            failed_files=["auth.py"],


Valid point. These tests intentionally validate the classification logic and routing decisions in isolation — the two things #57 actually changes. Testing qa_node end-to-end requires mocking the QA LLM response + JSON parse path + full GraphState construction, which is closer to an integration test. The existing test_orchestrator.py follows the same pattern (tests routing and state, not mocked LLM calls). Adding mocked-LLM node tests is a good improvement for a future testing pass.

Abernaughty · 2026-03-31T19:55:54Z

dev-suite/tests/test_qa_escalation.py

+            errors=["wrong file targeted", "missing dependency"],
+            failed_files=["wrong.py"],
+            is_architectural=True,
+            recommendation="Target auth.py instead of wrong.py",


Same rationale as the QA node comment — calling architect_node requires mocking _get_architect_llm(), _fetch_memory_context(), and the Gemini response chain. The test as written validates the context assembly contract (errors, failed_files, recommendation appear in the message), which is what this PR changes. Full node-level integration tests with stubbed LLMs are a worthwhile future improvement.

Abernaughty added 4 commits March 31, 2026 13:15

feat(events): add QA_ESCALATION SSE event type (#57)

a7170fc

Dashboard can now display "QA escalated to Architect" in the task timeline when an architectural failure triggers re-planning.

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

Abernaughty added 2 commits March 31, 2026 13:32

test: add case-insensitive failure_type tests (Codex feedback)

e6f5a3b

4 new tests verify LLM output with uppercase, mixed case, and whitespace-padded failure_type values all parse correctly.

Abernaughty commented Mar 31, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

test: add unknown failure_type fallback tests (Codex #2)

094689f

3 new tests verify that unknown LLM values ("design_flaw", "typo_value", "") gracefully fall back to is_architectural/status derivation instead of crashing. Total: 35 tests in test_qa_escalation.py.

Abernaughty commented Mar 31, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

Abernaughty added 2 commits March 31, 2026 13:46

test: add escalate-status inference edge case (Codex #3)

3c64ded

Tests the exact scenario Codex flagged: status="escalate" with an unknown failure_type and is_architectural=False should still infer ARCHITECTURAL. Total: 36 tests.

Abernaughty commented Mar 31, 2026

View reviewed changes

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

Abernaughty commented Mar 31, 2026

View reviewed changes

Abernaughty merged commit 6427d1b into main Mar 31, 2026
3 checks passed

Abernaughty deleted the feat/57-qa-escalation-path branch March 31, 2026 19:56

		elif self.status == "fail":
		self.failure_type = FailureType.CODE

Conversation

Abernaughty commented Mar 31, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Finding

Changes

src/agents/qa.py — FailureType enum + model enhancement

src/orchestrator.py — Enhanced prompts (3 changes)

src/api/events.py — QA_ESCALATION SSE event type

tests/test_qa_escalation.py — 28 new tests

Escalation Flow

Acceptance Criteria (from #57)

Summary by CodeRabbit

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Abernaughty left a comment

Choose a reason for hiding this comment

Uh oh!

Abernaughty Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Abernaughty commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Abernaughty left a comment

Choose a reason for hiding this comment

Uh oh!

Abernaughty Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Abernaughty commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Abernaughty left a comment

Choose a reason for hiding this comment

Uh oh!

Abernaughty Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Abernaughty commented Mar 31, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Abernaughty commented Mar 31, 2026 •

edited by coderabbitai bot

Loading

`src/agents/qa.py` — FailureType enum + model enhancement

`src/orchestrator.py` — Enhanced prompts (3 changes)

`src/api/events.py` — QA_ESCALATION SSE event type

`tests/test_qa_escalation.py` — 28 new tests

coderabbitai bot commented Mar 31, 2026 •

edited

Loading