feat: Agent tool binding — give Dev and QA agents access to workspace tools (#80) by Abernaughty · Pull Request #83 · Abernaughty/agent-dev

Abernaughty · 2026-04-01T03:21:19Z

Summary

Closes #80. Gives the Dev and QA agents the ability to use tools (read files, write files, run commands) during their turn, transforming them from single-shot text generators into iterative, tool-using agents.

What Changed

`orchestrator.py` (Major)

tool_calls_log added to GraphState and AgentState for tracking tool usage
MAX_TOOL_TURNS config (default 10, env MAX_TOOL_TURNS) caps tool iterations
DEV_TOOL_NAMES / QA_TOOL_NAMES — role-based tool sets. Dev gets read+write, QA gets read-only
_get_agent_tools(config, allowed_names) — extracts tools from RunnableConfig["configurable"]["tools"]
_execute_tool_call() — executes a single tool call, prefers async coroutine over sync
_run_tool_loop() — iterative loop: LLM -> tool_calls -> execute -> feed results -> repeat until done or max_turns
developer_node → now async, accepts config param. If tools available: bind_tools() + tool loop. If not: original single-shot behavior (zero regression)
qa_node → now async, accepts config param. Gets read-only tool subset (max 5 turns). Same fallback pattern
init_tools_config() — factory that creates a ToolProvider, loads tools from mcp-config.json, returns a RunnableConfig-compatible dict
run_task() — now initializes tools config and merges into invoke config

`events.py` (Minor)

Added TOOL_CALL = "tool_call" to EventType enum

`runner.py` (Medium)

Imports init_tools_config from orchestrator
Initializes tools config at task start, logs tool count
Emits TOOL_CALL SSE events for each new tool call during streaming
_handle_developer includes tool call count in timeline action
New _emit_tool_call() method

`test_tool_binding.py` (New — 35 tests)

_get_agent_tools: 8 tests (empty config, filtering, role subsets)
_execute_tool_call: 5 tests (success, unknown tool, exception, async/sync)
_run_tool_loop: 5 tests (no calls, single turn, max turns, token accumulation, failure)
developer_node with tools: 5 tests (bind_tools, single-shot, no config, tool logging, filtering)
qa_node with tools: 2 tests (read-only subset, without tools)
init_tools_config: 3 tests (no file, invalid JSON, valid config)
Events: 3 tests (TOOL_CALL type, SSE creation, publish)
State: 3 tests (GraphState field, AgentState field/init)
Config: 1 test (MAX_TOOL_TURNS default)

Design Decisions

Option B (RunnableConfig) chosen over ToolNode — tools passed via config, not state. Keeps GraphState JSON-serializable
Graceful degradation — no mcp-config.json or no tools = original single-shot behavior. Zero breaking changes
apply_code_node unchanged — still serves as safety net for code written in the LLM's final text response, even when tools are used
QA read-only — QA cannot write files or create PRs, only inspect

Testing

All files pass ast.parse() syntax validation. 35 unit tests with full mock coverage.

Summary by CodeRabbit

New Features
- Tool binding and iterative tool execution for developer and QA agents.
- Real-time tool-call events streamed for visibility into tool invocations.
Improvements
- Tool invocation counts and result previews included in task logs and final traces.
- Clarified task cancellation messaging.
Tests
- Added extensive tests for tool binding, execution loops, events, and async behavior; e2e tests reformatted (no behavioral changes).

…80)

… in timeline (#80)

…tools_config (#80) - Add tool_calls_log to GraphState and AgentState - Add MAX_TOOL_TURNS config (default 10) - Add DEV_TOOL_NAMES and QA_TOOL_NAMES role-based tool sets - Add _get_agent_tools() to extract tools from RunnableConfig - Add _execute_tool_call() for individual tool execution - Add _run_tool_loop() for iterative tool calling - Convert developer_node to async with tool binding support - Convert qa_node to async with read-only tool subset - Add init_tools_config() factory for workspace tool setup - Update run_task() to include tools in invoke config - Graceful fallback: no tools = original single-shot behavior

- Add tool_calls_log to GraphState and AgentState - Add MAX_TOOL_TURNS config (default 10) - Add DEV_TOOL_NAMES and QA_TOOL_NAMES role-based tool sets - Add _get_agent_tools(), _execute_tool_call(), _run_tool_loop() - Convert developer_node and qa_node to async with tool binding - Add init_tools_config() factory for workspace tool setup - Update run_task() to include tools in invoke config - Graceful fallback: no tools = original single-shot behavior

coderabbitai · 2026-04-01T03:21:32Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds workspace tool support: tools are initialized and injected into workflows; Dev/QA LLMs are bound to role-filtered tools; an iterative tool-execution loop records tool calls, emits TOOL_CALL SSE events, and developer/qa nodes accept an optional config.

Changes

Cohort / File(s)	Summary
Event System `dev-suite/src/api/events.py`	Added `EventType.TOOL_CALL = "tool_call"` to represent tool invocation SSE events.
Task Runner / SSE `dev-suite/src/api/runner.py`	Initialize `tools_config` with `init_tools_config()` and pass into `workflow.astream()` config; track `tool_calls_log` in initial state; emit `TOOL_CALL` events via new `_emit_tool_call()` helper; adjust completion message text.
Orchestrator & Tooling `dev-suite/src/orchestrator.py`	Added `init_tools_config()`; bind role-filtered tools to LLM via `llm.bind_tools()`; introduced async tool execution loop and helpers (`_run_tool_loop`, `_execute_tool_call`, `_run_async`); extended `GraphState`/`AgentState` with `tool_calls_log`, added `MAX_TOOL_TURNS`, token accounting; updated `developer_node`/`qa_node` signatures to accept optional `config`.
Tests `dev-suite/tests/test_tool_binding.py`, `dev-suite/tests/test_e2e.py`	Added comprehensive `test_tool_binding.py` exercising tool binding, execution loop, events, and init behavior; `test_e2e.py` formatting/content edited (docstrings/comments and mocks refactored).
Misc / Placeholder `.branch-placeholder`	Removed file literal `placeholder` content (file emptied / placeholder removed).

Sequence Diagram(s)

sequenceDiagram
    participant TaskRunner
    participant Orchestrator
    participant LLM as "LLM (bound)"
    participant ToolProvider
    participant EventBus

    TaskRunner->>Orchestrator: init_tools_config(workspace_root)
    Orchestrator->>ToolProvider: load tools (mcp-config.json)
    ToolProvider-->>Orchestrator: tools[]
    Orchestrator-->>TaskRunner: tools_config

    TaskRunner->>Orchestrator: run_task(config includes tools_config)
    Orchestrator->>LLM: bind_tools(tools_for_role)
    Orchestrator->>LLM: invoke(messages)
    LLM-->>Orchestrator: response (may include tool_calls)

    loop tool call loop (<= MAX_TOOL_TURNS)
        Orchestrator->>ToolProvider: execute_tool(tool_call)
        ToolProvider-->>Orchestrator: tool_result
        Orchestrator->>LLM: invoke(messages + ToolMessage(tool_result))
        LLM-->>Orchestrator: response (more tool_calls or final)
        Orchestrator->>EventBus: publish TOOL_CALL event
        EventBus-->>TaskRunner: TOOL_CALL SSE event
    end

    Orchestrator-->>TaskRunner: final state (includes tool_calls_log)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop, hop—tools at paw and mind,
I bind, I call, I log each find.
Events go out, the timeline sings,
Dev reads, QA peeks at things.
A little rabbit claps and grins.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of the pull request — adding tool binding so Dev and QA agents can access workspace tools, directly matching the primary changes across orchestrator.py, runner.py, and events.py.
Linked Issues check	✅ Passed	The pull request successfully implements all key requirements from issue `#80`: tool binding (orchestrator.py), tool execution loops with per-turn limits (developer_node/qa_node), role-scoped tools (DEV_TOOL_NAMES/QA_TOOL_NAMES), graceful fallback, tool-call SSE events, state extensions (tool_calls_log), and comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes directly support issue `#80` objectives: orchestrator.py adds tool binding and loop logic, runner.py emits tool-call events, events.py adds TOOL_CALL type, tests validate tool behavior. Minimal cleanup (test_e2e.py docstring/formatting) is justified maintenance. No extraneous changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/agent-tool-binding

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…a_node (#80) developer_node and qa_node are now async (issue #80). Updated 4 tests: - test_developer_node_generates_code -> async + await - test_qa_node_returns_pass -> async + await - test_qa_node_returns_fail -> async + await - test_qa_node_returns_escalate -> async + await Pipeline tests (run_task) remain sync -- LangGraph handles async nodes internally via workflow.invoke().

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/api/events.py`:
- Around line 52-53: The dashboard doesn't recognize the new
QA_ESCALATION/TOOL_CALL event because the dashboard type union and SSE wiring
lack 'tool_call'; update the dashboard API types (add 'tool_call' to the union
in api type definitions) and add handling in the SSE client (subscribe for
'tool_call' in the SSE handling code and dispatch it to the rest of the app
using the same window event pattern used for other events so the dashboard
receives real-time updates). Ensure you reference the same event string
"tool_call" in the type union, the SSE subscription switch/handler, and the
window.dispatchEvent/new CustomEvent dispatch so published runner events are
accepted and emitted to the UI.

In `@dev-suite/src/api/runner.py`:
- Around line 337-350: The current logic uses the cumulative tool_calls_log to
compute dev_tool_calls, causing counts to include previous retries; to fix,
record the baseline index/length of output["tool_calls_log"] at the start of the
developer pass (e.g., save a variable like dev_pass_tool_calls_start =
len(output.get("tool_calls_log", [])) before generating code), and when
computing dev_tool_calls replace the filter with only the new slice
output["tool_calls_log"][dev_pass_tool_calls_start:] (then filter by
tc.get("agent") == "developer") so the action string uses only tool calls from
the current developer pass; update any references in the code block that builds
action/event_type/retry to use this per-pass dev_tool_calls.

In `@dev-suite/src/orchestrator.py`:
- Around line 619-623: The call site in orchestrator.py is dereferencing
plan.timeout which raises AttributeError because ValidationPlan lacks a timeout
attribute; fix by either adding a timeout field/default to the ValidationPlan
class or stop passing plan.timeout here and let _run_sandbox_validation() use
its own default. Locate the ValidationPlan definition (class ValidationPlan) and
add an optional timeout attribute with a reasonable default, or remove the
timeout argument from the _run_sandbox_validation(...) call in the
sandbox_validate path so the helper's internal default is used; update any
callers/tests accordingly to keep behavior consistent.
- Around line 345-350: tool_calls_log currently saves raw previews in
args_preview and result_preview which can leak file contents or secrets (e.g.,
from filesystem_write, filesystem_read, github_read_diff) into GraphState and
TOOL_CALL events; instead, sanitize and persist only redacted metadata: for
args_preview derive and store non-sensitive fields such as filenames, sizes,
content hashes or a fixed "[REDACTED]" tag when any arg key indicates
file/content, and for result_preview detect large/plaintext content and replace
with metadata like "type":"file"|"text", "size":N, "hash":H or "[REDACTED]" if
content appears to contain source/secrets, updating the tool_calls_log
population logic to call a redaction helper (e.g., redact_tool_preview) before
assigning args_preview/result_preview so no raw content is written to GraphState
or emitted in TOOL_CALL.
- Around line 268-274: DEV_TOOL_NAMES currently includes "github_create_pr",
which allows a non-idempotent external write inside retrying LLM loops; remove
"github_create_pr" from the DEV_TOOL_NAMES set to prevent automatic PR creation
during iterative tool runs, and instead move PR creation to an explicit
post-review flow or wrap it behind an approval/confirmation guard; update any
code paths or tests that reference DEV_TOOL_NAMES to account for its removal and
ensure PR creation is only invoked from the designated
post-review/human-approved function.
- Around line 305-310: The current code calls the tool's internal coroutine or
invoke directly (checking tool.coroutine and calling tool.coroutine(tool_args))
which bypasses the tool wrapper and can break tools expecting unpacked kwargs;
update the call path to use the public async wrapper await
tool.ainvoke(tool_args) when awaiting tools, falling back to
tool.invoke(tool_args) only for sync tools, and then return
ToolMessage(content=str(result), tool_call_id=tool_id); specifically replace
references to tool.coroutine and direct coroutine invocation with await
tool.ainvoke(tool_args) while keeping tool.invoke(tool_args) for synchronous
calls so the tool wrapper/input handling remains intact.
- Line 408: The current run_task function must be converted to an async function
to support async graph nodes (since developer_node and qa_node are async);
change its definition to async def run_task(...), replace the synchronous
workflow.invoke(...) call with await workflow.ainvoke(...) (or await
workflow.astream(...) if streaming is intended), and update any call sites to
await run_task(...) so callers handle the coroutine; ensure you preserve the
existing use of config: dict | None and continue passing
config["configurable"]["tools"] into _get_agent_tools() as before.

In `@dev-suite/tests/test_tool_binding.py`:
- Around line 340-341: The test patches the names in src.orchestrator but
init_tools_config imports load_mcp_config, create_provider and get_tools inside
the function, so those patches miss; update the test to patch the actual
defining module (patch "src.tools.load_mcp_config", "src.tools.create_provider",
and "src.tools.get_tools" or the correct modules where those functions are
implemented) or alternatively refactor init_tools_config to hoist the imports so
patch("src.orchestrator.load_mcp_config") etc. will work; locate the symbols
init_tools_config, load_mcp_config, create_provider, and get_tools to apply the
change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 45ce17d1-9ce1-43d5-ad14-fef29bb1520c

📥 Commits

Reviewing files that changed from the base of the PR and between c16aa43 and 5689745.

📒 Files selected for processing (5)

.branch-placeholder
dev-suite/src/api/events.py
dev-suite/src/api/runner.py
dev-suite/src/orchestrator.py
dev-suite/tests/test_tool_binding.py

💤 Files with no reviewable changes (1)

.branch-placeholder

coderabbitai · 2026-04-01T03:36:01Z

dev-suite/src/api/events.py

    QA_ESCALATION = "qa_escalation"
+    TOOL_CALL = "tool_call"


⚠️ Potential issue | 🟠 Major

Update the dashboard SSE contract for tool_call.

This enum addition is still invisible to the browser: dashboard/src/lib/types/api.ts (Lines 172-177) excludes 'tool_call', and dashboard/src/lib/sse.ts (Lines 44-80 and 137-150) neither subscribes to nor dispatches it. The runner can publish the event, but the dashboard will drop it, so Issue #80's visibility goal stays broken. Based on learnings: Use SSE client with window event dispatch for real-time dashboard updates from backend.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/api/events.py` around lines 52 - 53, The dashboard doesn't recognize the new QA_ESCALATION/TOOL_CALL event because the dashboard type union and SSE wiring lack 'tool_call'; update the dashboard API types (add 'tool_call' to the union in api type definitions) and add handling in the SSE client (subscribe for 'tool_call' in the SSE handling code and dispatch it to the rest of the app using the same window event pattern used for other events so the dashboard receives real-time updates). Ensure you reference the same event string "tool_call" in the type union, the SSE subscription switch/handler, and the window.dispatchEvent/new CustomEvent dispatch so published runner events are accepted and emitted to the UI.

dev-suite/src/api/runner.py

dev-suite/src/orchestrator.py

coderabbitai · 2026-04-01T03:36:02Z

dev-suite/src/orchestrator.py

-
-def developer_node(state: GraphState) -> dict:
-    """Lead Dev: executes the Blueprint and generates code."""
+async def developer_node(state: GraphState, config: dict | None = None) -> dict:


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n 'async def developer_node|async def qa_node|config: dict \| None|workflow\.invoke\(' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 301

🏁 Script executed:

sed -n '741,760p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1701

🏁 Script executed:

head -40 dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1460

🏁 Script executed:

rg -n 'def run_task|RunnableConfig' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 214

🏁 Script executed:

rg -n 'from langchain|from langgraph|import.*RunnableConfig' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 269

🏁 Script executed:

rg -n 'run_task\(' dev-suite/src/orchestrator.py | head -20

Repository: Abernaughty/agent-dev

Length of output: 149

🏁 Script executed:

wc -l dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 98

🏁 Script executed:

sed -n '400,415p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1163

🏁 Script executed:

sed -n '520,535p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 948

🏁 Script executed:

sed -n '1,50p' dev-suite/src/orchestrator.py | tail -20

Repository: Abernaughty/agent-dev

Length of output: 700

🏁 Script executed:

sed -n '30,35p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 285

🏁 Script executed:

rg -n 'ainvoke|astream' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 134

🏁 Script executed:

python3 << 'EOF' # Check if RunnableConfig is available in langchain_core try: from langchain_core.runnables import RunnableConfig print("RunnableConfig is available in langchain_core.runnables") except ImportError: print("RunnableConfig not found in langchain_core.runnables") EOF

Repository: Abernaughty/agent-dev

Length of output: 117

🏁 Script executed:

python3 << 'EOF' # Check various possible locations for RunnableConfig locations = [ "langchain_core.runnables", "langchain_core.runnables.config", "langgraph.types", "langgraph.graph", ] for loc in locations: try: module_parts = loc.rsplit(".", 1) if len(module_parts) == 2: module, name = module_parts exec(f"from {module} import {name}") else: exec(f"import {loc}") print(f"✓ Found in {loc}") except ImportError: print(f"✗ Not in {loc}") EOF

Repository: Abernaughty/agent-dev

Length of output: 189

🏁 Script executed:

sed -n '741,755p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1701

🏁 Script executed:

rg -n 'def create_workflow' dev-suite/src/orchestrator.py -A 5

Repository: Abernaughty/agent-dev

Length of output: 210

🏁 Script executed:

python3 << 'EOF' # Check what version of langchain and langgraph are available try: import langchain_core print(f"langchain_core version: {langchain_core.__version__ if hasattr(langchain_core, '__version__') else 'unknown'}") except: print("langchain_core version: could not determine") try: import langgraph print(f"langgraph version: {langgraph.__version__ if hasattr(langgraph, '__version__') else 'unknown'}") except: print("langgraph version: could not determine") EOF

Repository: Abernaughty/agent-dev

Length of output: 147

🏁 Script executed:

rg -n 'class.*Config|RunnableConfig' dev-suite/src/ --type python -A 2 | head -40

Repository: Abernaughty/agent-dev

Length of output: 95

🏁 Script executed:

rg -n 'config' dev-suite/src/orchestrator.py | grep -E 'developer_node|qa_node' -B 2 -A 2

Repository: Abernaughty/agent-dev

Length of output: 519

🏁 Script executed:

grep -r 'RunnableConfig' dev-suite/ 2>/dev/null || echo "No RunnableConfig found in codebase"

Repository: Abernaughty/agent-dev

Length of output: 231

🏁 Script executed:

sed -n '408,430p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1317

🏁 Script executed:

sed -n '280,295p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 487

🏁 Script executed:

rg -n '_get_agent_tools' dev-suite/src/orchestrator.py -A 10 | head -20

Repository: Abernaughty/agent-dev

Length of output: 852

🏁 Script executed:

sed -n '1,80p' dev-suite/src/orchestrator.py | tail -30

Repository: Abernaughty/agent-dev

Length of output: 850

🏁 Script executed:

rg -n 'from langgraph' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 118

🏁 Script executed:

python3 << 'EOF' # Check what methods a compiled graph has from langgraph.graph import StateGraph from langgraph.graph import END, START import inspect # Create a minimal graph to inspect its methods graph = StateGraph(dict) graph.add_node("test", lambda x: x) compiled = graph.compile() # Check for async methods methods = [m for m in dir(compiled) if not m.startswith('_')] print("Available methods on compiled graph:") for method in sorted(methods): print(f" {method}") if 'ainvoke' in methods: print("\n✓ ainvoke available") if 'invoke' in methods: print("✓ invoke available") EOF

Repository: Abernaughty/agent-dev

Length of output: 186

🏁 Script executed:

grep -r 'run_task(' dev-suite/ --include='*.py' | grep -v '^Binary' | head -20

Repository: Abernaughty/agent-dev

Length of output: 2031

🏁 Script executed:

sed -n '715,725p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 275

🏁 Script executed:

python3 << 'EOF' # Check LangGraph documentation/behavior for async graphs # Even without langgraph installed, we can reason about the error message error_msg = 'TypeError: No synchronous function provided to "developer"' print("Error analysis:") print(f"- Error suggests async node 'developer' cannot be run with sync invoke()") print(f"- This occurs when async nodes are in the graph but invoke() is used") print(f"- Solutions:") print(f" 1. Make all nodes sync (revert async)") print(f" 2. Use ainvoke() or astream() instead of invoke()") print(f" 3. Make run_task() async to await ainvoke()") EOF

Repository: Abernaughty/agent-dev

Length of output: 373

Switch run_task() to async invocation to support async graph nodes.

developer_node() and qa_node() are async, but run_task() still uses workflow.invoke() (sync invocation) on line 751. This causes the CI error TypeError: No synchronous function provided to "developer". LangGraph requires .ainvoke() or .astream() for graphs containing async nodes. Make run_task() async and await workflow.ainvoke() instead of calling invoke().

The config: dict | None parameter type is correct and intentional—it's used to pass tools via config["configurable"]["tools"] (see _get_agent_tools() at line 282).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/orchestrator.py` at line 408, The current run_task function must be converted to an async function to support async graph nodes (since developer_node and qa_node are async); change its definition to async def run_task(...), replace the synchronous workflow.invoke(...) call with await workflow.ainvoke(...) (or await workflow.astream(...) if streaming is intended), and update any call sites to await run_task(...) so callers handle the coroutine; ensure you preserve the existing use of config: dict | None and continue passing config["configurable"]["tools"] into _get_agent_tools() as before.

dev-suite/src/orchestrator.py

dev-suite/tests/test_tool_binding.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

dev-suite/tests/test_e2e.py (1)
140-141: ⚠️ Potential issue | 🔴 Critical

E2E pipeline still uses synchronous invocation with async nodes—must migrate to async path

The full-pipeline E2E tests call run_task() synchronously, but developer_node (line 408) and qa_node (line 525) are now async. Invoking a mixed async/sync StateGraph with .invoke() will fail with TypeError: No synchronous function provided to "developer". Either:

Convert run_task() to async and await it in E2E tests, or

Use workflow.ainvoke() instead of workflow.invoke() at line 751.

Node-level tests are correctly converted to async with @pytest.mark.asyncio, but the full-pipeline path at lines 140–545 remains synchronous.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_e2e.py` around lines 140 - 141, The E2E test calls
run_task() synchronously but developer_node and qa_node are async, so update
run_task to use the async invocation path: either make run_task an async
function and await it from tests, or modify run_task to call workflow.ainvoke()
(instead of workflow.invoke()) when executing the StateGraph so async nodes are
awaited; locate run_task, developer_node, qa_node and the workflow.invoke call
and switch to the async pattern and update the E2E test to await run_task if you
convert it to async.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@dev-suite/tests/test_e2e.py`:
- Around line 140-141: The E2E test calls run_task() synchronously but
developer_node and qa_node are async, so update run_task to use the async
invocation path: either make run_task an async function and await it from tests,
or modify run_task to call workflow.ainvoke() (instead of workflow.invoke())
when executing the StateGraph so async nodes are awaited; locate run_task,
developer_node, qa_node and the workflow.invoke call and switch to the async
pattern and update the E2E test to await run_task if you convert it to async.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f92fbd4e-dc11-4eb6-a37c-3bc7eef817d7

📥 Commits

Reviewing files that changed from the base of the PR and between 5689745 and 04c28c2.

📒 Files selected for processing (1)

dev-suite/tests/test_e2e.py

…ndbox_validate_node (#80) - developer_node and qa_node are now sync (def, not async def) - Tool loop runs via _run_async() bridge when tools present - No-tools path stays fully synchronous (zero regression) - Fixed sandbox_validate_node: parsed_files=None guard, WARNING trace - Fixes LangGraph sync invoke() TypeError for async nodes

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/tests/test_e2e.py`:
- Line 1: The file dev-suite/tests/test_e2e.py currently contains an undefined
identifier RESTORE_FROM_MAIN which causes a lint/build failure; either restore
the real test file from main (replace the placeholder with the original test
logic) or delete test_e2e.py if it’s intentionally removed and you will rely on
test_e2e_live.py for E2E coverage; ensure the branch contains no stray
RESTORE_FROM_MAIN token (remove or replace it) so ruff F821 is resolved before
merging.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ae6ef23-4e9a-4fbf-a5c4-31f598093ac6

📥 Commits

Reviewing files that changed from the base of the PR and between 04c28c2 and 33969b9.

📒 Files selected for processing (1)

dev-suite/tests/test_e2e.py

dev-suite/tests/test_e2e.py

…ync with _run_async bridge (#80)

…oad_mcp_config (#80) - Developer/QA node tests now call nodes synchronously (no await) - Fixed patch target: src.tools.load_mcp_config (not src.orchestrator) - _execute_tool_call and _run_tool_loop tests remain async (they ARE async)

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

dev-suite/tests/test_e2e.py (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Replace the placeholder before merge.

Line 1 is still a bare RESTORE_FROM_MAIN token. That trips Ruff F821 and leaves this module with no runnable tests. Please either restore the real dev-suite/tests/test_e2e.py contents in this branch or delete the file if the suite is intentionally being retired.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_e2e.py` at line 1, The file contains a placeholder token
RESTORE_FROM_MAIN causing Ruff F821 and leaving no tests; replace that token by
restoring the original test module contents for dev-suite/tests/test_e2e.py
(reintroduce the actual test functions, imports, and fixtures previously
present) or, if the e2e suite is intentionally retired, remove
dev-suite/tests/test_e2e.py entirely and update any test suite config
referencing it; ensure the restored file defines runnable tests (e.g., proper
pytest test_ functions and imports) so linters and test discovery pass.

dev-suite/src/orchestrator.py (3)

239-243: ⚠️ Potential issue | 🟠 Major

Use LangChain’s public tool wrappers (ainvoke/invoke) instead of tool.coroutine.

Calling tool.coroutine(...) directly bypasses wrapper behavior and can break tool input/config handling.

Suggested fix

-        if hasattr(tool, "coroutine") and tool.coroutine:
-            result = await tool.coroutine(tool_args)
-        else:
-            result = tool.invoke(tool_args)
+        if hasattr(tool, "ainvoke"):
+            result = await tool.ainvoke(tool_args)
+        else:
+            result = tool.invoke(tool_args)

In LangChain Core 0.3.x, what is the recommended public API for async tool execution: `tool.coroutine(...)` or `await tool.ainvoke(...)`?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/orchestrator.py` around lines 239 - 243, The code is calling
tool.coroutine(...) directly which bypasses LangChain's public wrappers; change
the async branch to await the public wrapper (await tool.ainvoke(tool_args)) and
keep the sync branch using tool.invoke(tool_args), so replace references to
tool.coroutine with tool.ainvoke to preserve input/config handling for tools
(keep returning ToolMessage(content=str(result), tool_call_id=tool_id)).

214-215: ⚠️ Potential issue | 🔴 Critical

Remove github_create_pr from the iterative Dev tool allowlist.

This enables a non-idempotent external write inside a retrying tool loop and can create duplicate/partial PRs before QA/safety gates complete.

Suggested fix

-DEV_TOOL_NAMES = {"filesystem_read", "filesystem_write", "filesystem_list", "github_read_diff", "github_create_pr"}
+DEV_TOOL_NAMES = {"filesystem_read", "filesystem_write", "filesystem_list", "github_read_diff"}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/orchestrator.py` around lines 214 - 215, DEV_TOOL_NAMES
currently includes "github_create_pr", which allows a non-idempotent external
write inside a retrying loop; remove "github_create_pr" from the DEV_TOOL_NAMES
set so only safe idempotent tools remain (leave QA_TOOL_NAMES unchanged). Update
the DEV_TOOL_NAMES definition (symbol DEV_TOOL_NAMES) to exclude
"github_create_pr" and run related tests/linters to ensure no other code relies
on that tool being in the dev allowlist.

267-267: ⚠️ Potential issue | 🟠 Major

Redact tool args/results before persisting to tool_calls_log.

args_preview/result_preview can include code or secrets (e.g., file contents, diffs) and are later surfaced downstream.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/orchestrator.py` at line 267, The current append to
tool_calls_log stores raw args/result previews which may contain secrets or
large code; change this to sanitize and redact before persisting by introducing
and using a sanitizer (e.g., sanitize_preview) when building the dict for
tool_calls_log.append: call it on tc.get("args", {}) and on tool_msg.content to
mask known sensitive keys, remove long file contents/diffs, and replace with a
fixed token or truncated summary, then truncate to 200 chars; update references
to tc and tool_msg usage in the append site so only sanitized/trimmed strings
are stored in args_preview and result_preview and mark success as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 249-270: The loop can skip entirely when max_turns <= 0 leaving
`response` undefined; in _run_tool_loop validate max_turns at the start (or
handle the case before the for-loop) and return a sensible result instead of
relying on `response` being set: append a trace entry like f"{agent_name}: tool
loop skipped due to max_turns={max_turns}", log a warning via logger, and return
the last message from the incoming `messages` (e.g., messages[-1] or a fallback
empty response), current tokens_used, and an empty tool_calls_log so callers of
_run_tool_loop never receive an undefined `response`.

---

Duplicate comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 239-243: The code is calling tool.coroutine(...) directly which
bypasses LangChain's public wrappers; change the async branch to await the
public wrapper (await tool.ainvoke(tool_args)) and keep the sync branch using
tool.invoke(tool_args), so replace references to tool.coroutine with
tool.ainvoke to preserve input/config handling for tools (keep returning
ToolMessage(content=str(result), tool_call_id=tool_id)).
- Around line 214-215: DEV_TOOL_NAMES currently includes "github_create_pr",
which allows a non-idempotent external write inside a retrying loop; remove
"github_create_pr" from the DEV_TOOL_NAMES set so only safe idempotent tools
remain (leave QA_TOOL_NAMES unchanged). Update the DEV_TOOL_NAMES definition
(symbol DEV_TOOL_NAMES) to exclude "github_create_pr" and run related
tests/linters to ensure no other code relies on that tool being in the dev
allowlist.
- Line 267: The current append to tool_calls_log stores raw args/result previews
which may contain secrets or large code; change this to sanitize and redact
before persisting by introducing and using a sanitizer (e.g., sanitize_preview)
when building the dict for tool_calls_log.append: call it on tc.get("args", {})
and on tool_msg.content to mask known sensitive keys, remove long file
contents/diffs, and replace with a fixed token or truncated summary, then
truncate to 200 chars; update references to tc and tool_msg usage in the append
site so only sanitized/trimmed strings are stored in args_preview and
result_preview and mark success as before.

In `@dev-suite/tests/test_e2e.py`:
- Line 1: The file contains a placeholder token RESTORE_FROM_MAIN causing Ruff
F821 and leaving no tests; replace that token by restoring the original test
module contents for dev-suite/tests/test_e2e.py (reintroduce the actual test
functions, imports, and fixtures previously present) or, if the e2e suite is
intentionally retired, remove dev-suite/tests/test_e2e.py entirely and update
any test suite config referencing it; ensure the restored file defines runnable
tests (e.g., proper pytest test_ functions and imports) so linters and test
discovery pass.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39f31fc9-42e8-4845-a135-5caae6db58e2

📥 Commits

Reviewing files that changed from the base of the PR and between 33969b9 and 29ad3c4.

📒 Files selected for processing (2)

dev-suite/src/orchestrator.py
dev-suite/tests/test_e2e.py

dev-suite/src/orchestrator.py

Fix 2: Remove github_create_pr from DEV_TOOL_NAMES (non-idempotent in retry loop) Fix 3: Use ainvoke/invoke public API instead of tool.coroutine Fix 4: Add _sanitize_preview() to redact secrets from tool_calls_log Fix 5: Guard max_turns <= 0 in _run_tool_loop Fix 8: Add design note comment explaining _run_async sync bridge choice Add import re for secret pattern matching

Track _dev_tool_baselines per task_id so _handle_developer only counts tool calls from the current developer pass, not cumulative across retries.

- FakeTool now uses ainvoke/invoke API (Fix 3), add SyncOnlyFakeTool - test_dev_tools_filtered_to_dev_set expects github_create_pr excluded (Fix 2) - test_dev_has_no_pr_creation: new test for Fix 2 - TestSanitizePreview: 6 new tests for Fix 4 - test_max_turns_zero_returns_immediately: new test for Fix 5

… ValidationPlan The compacted sandbox_validate_node was calling format_validation_summary(result) where result is a SandboxResult, but the function expects a ValidationPlan. Reverted to direct field access: result.exit_code, result.tests_passed, result.tests_failed — matching the original pre-PR behavior.

Abernaughty added 7 commits March 31, 2026 20:56

chore: create feature branch for #80

87ded33

feat(events): add TOOL_CALL event type for agent tool usage tracking (#…

63277c8

…80)

feat(runner): add tools_config init, TOOL_CALL SSE events, tool usage…

38e32ad

… in timeline (#80)

test(tool-binding): add 35 tests for agent tool binding (#80)

adc12f7

chore: remove branch placeholder

5689745

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

Abernaughty added 3 commits March 31, 2026 21:44

placeholder

33969b9

temp placeholder for test_e2e

29ad3c4

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

dev-suite/tests/test_e2e.py Outdated Show resolved Hide resolved

Abernaughty added 2 commits March 31, 2026 21:50

fix(test_e2e): restore sync node calls — developer_node/qa_node are s…

9889eb7

…ync with _run_async bridge (#80)

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

dev-suite/src/orchestrator.py Show resolved Hide resolved

Abernaughty added 4 commits March 31, 2026 22:22

fix(runner): per-pass dev tool call count (Fix 6)

b86c39a

Track _dev_tool_baselines per task_id so _handle_developer only counts tool calls from the current developer pass, not cumulative across retries.

Abernaughty merged commit 380ad12 into main Apr 1, 2026
3 checks passed

Abernaughty deleted the feat/agent-tool-binding branch April 1, 2026 04:39

This was referenced Apr 1, 2026

feat: Dashboard TOOL_CALL SSE event wiring #85

Open

feat: E2E integration test — full pipeline with real LLMs #86

Open

feat(dashboard): wire TOOL_CALL SSE events to timeline (#85) #87

Open

Conversation

Abernaughty commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

orchestrator.py (Major)

events.py (Minor)

runner.py (Medium)

test_tool_binding.py (New — 35 tests)

Design Decisions

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Abernaughty commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

`orchestrator.py` (Major)

`events.py` (Minor)

`runner.py` (Medium)

`test_tool_binding.py` (New — 35 tests)

coderabbitai bot commented Apr 1, 2026 •

edited

Loading