Skip to content

feat: Agent tool binding — give Dev and QA agents access to workspace tools (#80)#83

Merged
Abernaughty merged 17 commits intomainfrom
feat/agent-tool-binding
Apr 1, 2026
Merged

feat: Agent tool binding — give Dev and QA agents access to workspace tools (#80)#83
Abernaughty merged 17 commits intomainfrom
feat/agent-tool-binding

Conversation

@Abernaughty
Copy link
Copy Markdown
Owner

@Abernaughty Abernaughty commented Apr 1, 2026

Summary

Closes #80. Gives the Dev and QA agents the ability to use tools (read files, write files, run commands) during their turn, transforming them from single-shot text generators into iterative, tool-using agents.

What Changed

orchestrator.py (Major)

  • tool_calls_log added to GraphState and AgentState for tracking tool usage
  • MAX_TOOL_TURNS config (default 10, env MAX_TOOL_TURNS) caps tool iterations
  • DEV_TOOL_NAMES / QA_TOOL_NAMES — role-based tool sets. Dev gets read+write, QA gets read-only
  • _get_agent_tools(config, allowed_names) — extracts tools from RunnableConfig["configurable"]["tools"]
  • _execute_tool_call() — executes a single tool call, prefers async coroutine over sync
  • _run_tool_loop() — iterative loop: LLM -> tool_calls -> execute -> feed results -> repeat until done or max_turns
  • developer_node → now async, accepts config param. If tools available: bind_tools() + tool loop. If not: original single-shot behavior (zero regression)
  • qa_node → now async, accepts config param. Gets read-only tool subset (max 5 turns). Same fallback pattern
  • init_tools_config() — factory that creates a ToolProvider, loads tools from mcp-config.json, returns a RunnableConfig-compatible dict
  • run_task() — now initializes tools config and merges into invoke config

events.py (Minor)

  • Added TOOL_CALL = "tool_call" to EventType enum

runner.py (Medium)

  • Imports init_tools_config from orchestrator
  • Initializes tools config at task start, logs tool count
  • Emits TOOL_CALL SSE events for each new tool call during streaming
  • _handle_developer includes tool call count in timeline action
  • New _emit_tool_call() method

test_tool_binding.py (New — 35 tests)

  • _get_agent_tools: 8 tests (empty config, filtering, role subsets)
  • _execute_tool_call: 5 tests (success, unknown tool, exception, async/sync)
  • _run_tool_loop: 5 tests (no calls, single turn, max turns, token accumulation, failure)
  • developer_node with tools: 5 tests (bind_tools, single-shot, no config, tool logging, filtering)
  • qa_node with tools: 2 tests (read-only subset, without tools)
  • init_tools_config: 3 tests (no file, invalid JSON, valid config)
  • Events: 3 tests (TOOL_CALL type, SSE creation, publish)
  • State: 3 tests (GraphState field, AgentState field/init)
  • Config: 1 test (MAX_TOOL_TURNS default)

Design Decisions

  • Option B (RunnableConfig) chosen over ToolNode — tools passed via config, not state. Keeps GraphState JSON-serializable
  • Graceful degradation — no mcp-config.json or no tools = original single-shot behavior. Zero breaking changes
  • apply_code_node unchanged — still serves as safety net for code written in the LLM's final text response, even when tools are used
  • QA read-only — QA cannot write files or create PRs, only inspect

Testing

All files pass ast.parse() syntax validation. 35 unit tests with full mock coverage.

Summary by CodeRabbit

  • New Features

    • Tool binding and iterative tool execution for developer and QA agents.
    • Real-time tool-call events streamed for visibility into tool invocations.
  • Improvements

    • Tool invocation counts and result previews included in task logs and final traces.
    • Clarified task cancellation messaging.
  • Tests

    • Added extensive tests for tool binding, execution loops, events, and async behavior; e2e tests reformatted (no behavioral changes).

…tools_config (#80)

- Add tool_calls_log to GraphState and AgentState
- Add MAX_TOOL_TURNS config (default 10)
- Add DEV_TOOL_NAMES and QA_TOOL_NAMES role-based tool sets
- Add _get_agent_tools() to extract tools from RunnableConfig
- Add _execute_tool_call() for individual tool execution
- Add _run_tool_loop() for iterative tool calling
- Convert developer_node to async with tool binding support
- Convert qa_node to async with read-only tool subset
- Add init_tools_config() factory for workspace tool setup
- Update run_task() to include tools in invoke config
- Graceful fallback: no tools = original single-shot behavior
- Add tool_calls_log to GraphState and AgentState
- Add MAX_TOOL_TURNS config (default 10)
- Add DEV_TOOL_NAMES and QA_TOOL_NAMES role-based tool sets
- Add _get_agent_tools(), _execute_tool_call(), _run_tool_loop()
- Convert developer_node and qa_node to async with tool binding
- Add init_tools_config() factory for workspace tool setup
- Update run_task() to include tools in invoke config
- Graceful fallback: no tools = original single-shot behavior
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds workspace tool support: tools are initialized and injected into workflows; Dev/QA LLMs are bound to role-filtered tools; an iterative tool-execution loop records tool calls, emits TOOL_CALL SSE events, and developer/qa nodes accept an optional config.

Changes

Cohort / File(s) Summary
Event System
dev-suite/src/api/events.py
Added EventType.TOOL_CALL = "tool_call" to represent tool invocation SSE events.
Task Runner / SSE
dev-suite/src/api/runner.py
Initialize tools_config with init_tools_config() and pass into workflow.astream() config; track tool_calls_log in initial state; emit TOOL_CALL events via new _emit_tool_call() helper; adjust completion message text.
Orchestrator & Tooling
dev-suite/src/orchestrator.py
Added init_tools_config(); bind role-filtered tools to LLM via llm.bind_tools(); introduced async tool execution loop and helpers (_run_tool_loop, _execute_tool_call, _run_async); extended GraphState/AgentState with tool_calls_log, added MAX_TOOL_TURNS, token accounting; updated developer_node/qa_node signatures to accept optional config.
Tests
dev-suite/tests/test_tool_binding.py, dev-suite/tests/test_e2e.py
Added comprehensive test_tool_binding.py exercising tool binding, execution loop, events, and init behavior; test_e2e.py formatting/content edited (docstrings/comments and mocks refactored).
Misc / Placeholder
.branch-placeholder
Removed file literal placeholder content (file emptied / placeholder removed).

Sequence Diagram(s)

sequenceDiagram
    participant TaskRunner
    participant Orchestrator
    participant LLM as "LLM (bound)"
    participant ToolProvider
    participant EventBus

    TaskRunner->>Orchestrator: init_tools_config(workspace_root)
    Orchestrator->>ToolProvider: load tools (mcp-config.json)
    ToolProvider-->>Orchestrator: tools[]
    Orchestrator-->>TaskRunner: tools_config

    TaskRunner->>Orchestrator: run_task(config includes tools_config)
    Orchestrator->>LLM: bind_tools(tools_for_role)
    Orchestrator->>LLM: invoke(messages)
    LLM-->>Orchestrator: response (may include tool_calls)

    loop tool call loop (<= MAX_TOOL_TURNS)
        Orchestrator->>ToolProvider: execute_tool(tool_call)
        ToolProvider-->>Orchestrator: tool_result
        Orchestrator->>LLM: invoke(messages + ToolMessage(tool_result))
        LLM-->>Orchestrator: response (more tool_calls or final)
        Orchestrator->>EventBus: publish TOOL_CALL event
        EventBus-->>TaskRunner: TOOL_CALL SSE event
    end

    Orchestrator-->>TaskRunner: final state (includes tool_calls_log)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 Hop, hop—tools at paw and mind,
I bind, I call, I log each find.
Events go out, the timeline sings,
Dev reads, QA peeks at things.
A little rabbit claps and grins.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the pull request — adding tool binding so Dev and QA agents can access workspace tools, directly matching the primary changes across orchestrator.py, runner.py, and events.py.
Linked Issues check ✅ Passed The pull request successfully implements all key requirements from issue #80: tool binding (orchestrator.py), tool execution loops with per-turn limits (developer_node/qa_node), role-scoped tools (DEV_TOOL_NAMES/QA_TOOL_NAMES), graceful fallback, tool-call SSE events, state extensions (tool_calls_log), and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes directly support issue #80 objectives: orchestrator.py adds tool binding and loop logic, runner.py emits tool-call events, events.py adds TOOL_CALL type, tests validate tool behavior. Minimal cleanup (test_e2e.py docstring/formatting) is justified maintenance. No extraneous changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-tool-binding

Comment @coderabbitai help to get the list of available commands and usage tips.

…a_node (#80)

developer_node and qa_node are now async (issue #80). Updated 4 tests:
- test_developer_node_generates_code -> async + await
- test_qa_node_returns_pass -> async + await
- test_qa_node_returns_fail -> async + await
- test_qa_node_returns_escalate -> async + await

Pipeline tests (run_task) remain sync -- LangGraph handles async nodes
internally via workflow.invoke().
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/api/events.py`:
- Around line 52-53: The dashboard doesn't recognize the new
QA_ESCALATION/TOOL_CALL event because the dashboard type union and SSE wiring
lack 'tool_call'; update the dashboard API types (add 'tool_call' to the union
in api type definitions) and add handling in the SSE client (subscribe for
'tool_call' in the SSE handling code and dispatch it to the rest of the app
using the same window event pattern used for other events so the dashboard
receives real-time updates). Ensure you reference the same event string
"tool_call" in the type union, the SSE subscription switch/handler, and the
window.dispatchEvent/new CustomEvent dispatch so published runner events are
accepted and emitted to the UI.

In `@dev-suite/src/api/runner.py`:
- Around line 337-350: The current logic uses the cumulative tool_calls_log to
compute dev_tool_calls, causing counts to include previous retries; to fix,
record the baseline index/length of output["tool_calls_log"] at the start of the
developer pass (e.g., save a variable like dev_pass_tool_calls_start =
len(output.get("tool_calls_log", [])) before generating code), and when
computing dev_tool_calls replace the filter with only the new slice
output["tool_calls_log"][dev_pass_tool_calls_start:] (then filter by
tc.get("agent") == "developer") so the action string uses only tool calls from
the current developer pass; update any references in the code block that builds
action/event_type/retry to use this per-pass dev_tool_calls.

In `@dev-suite/src/orchestrator.py`:
- Around line 619-623: The call site in orchestrator.py is dereferencing
plan.timeout which raises AttributeError because ValidationPlan lacks a timeout
attribute; fix by either adding a timeout field/default to the ValidationPlan
class or stop passing plan.timeout here and let _run_sandbox_validation() use
its own default. Locate the ValidationPlan definition (class ValidationPlan) and
add an optional timeout attribute with a reasonable default, or remove the
timeout argument from the _run_sandbox_validation(...) call in the
sandbox_validate path so the helper's internal default is used; update any
callers/tests accordingly to keep behavior consistent.
- Around line 345-350: tool_calls_log currently saves raw previews in
args_preview and result_preview which can leak file contents or secrets (e.g.,
from filesystem_write, filesystem_read, github_read_diff) into GraphState and
TOOL_CALL events; instead, sanitize and persist only redacted metadata: for
args_preview derive and store non-sensitive fields such as filenames, sizes,
content hashes or a fixed "[REDACTED]" tag when any arg key indicates
file/content, and for result_preview detect large/plaintext content and replace
with metadata like "type":"file"|"text", "size":N, "hash":H or "[REDACTED]" if
content appears to contain source/secrets, updating the tool_calls_log
population logic to call a redaction helper (e.g., redact_tool_preview) before
assigning args_preview/result_preview so no raw content is written to GraphState
or emitted in TOOL_CALL.
- Around line 268-274: DEV_TOOL_NAMES currently includes "github_create_pr",
which allows a non-idempotent external write inside retrying LLM loops; remove
"github_create_pr" from the DEV_TOOL_NAMES set to prevent automatic PR creation
during iterative tool runs, and instead move PR creation to an explicit
post-review flow or wrap it behind an approval/confirmation guard; update any
code paths or tests that reference DEV_TOOL_NAMES to account for its removal and
ensure PR creation is only invoked from the designated
post-review/human-approved function.
- Around line 305-310: The current code calls the tool's internal coroutine or
invoke directly (checking tool.coroutine and calling tool.coroutine(tool_args))
which bypasses the tool wrapper and can break tools expecting unpacked kwargs;
update the call path to use the public async wrapper await
tool.ainvoke(tool_args) when awaiting tools, falling back to
tool.invoke(tool_args) only for sync tools, and then return
ToolMessage(content=str(result), tool_call_id=tool_id); specifically replace
references to tool.coroutine and direct coroutine invocation with await
tool.ainvoke(tool_args) while keeping tool.invoke(tool_args) for synchronous
calls so the tool wrapper/input handling remains intact.
- Line 408: The current run_task function must be converted to an async function
to support async graph nodes (since developer_node and qa_node are async);
change its definition to async def run_task(...), replace the synchronous
workflow.invoke(...) call with await workflow.ainvoke(...) (or await
workflow.astream(...) if streaming is intended), and update any call sites to
await run_task(...) so callers handle the coroutine; ensure you preserve the
existing use of config: dict | None and continue passing
config["configurable"]["tools"] into _get_agent_tools() as before.

In `@dev-suite/tests/test_tool_binding.py`:
- Around line 340-341: The test patches the names in src.orchestrator but
init_tools_config imports load_mcp_config, create_provider and get_tools inside
the function, so those patches miss; update the test to patch the actual
defining module (patch "src.tools.load_mcp_config", "src.tools.create_provider",
and "src.tools.get_tools" or the correct modules where those functions are
implemented) or alternatively refactor init_tools_config to hoist the imports so
patch("src.orchestrator.load_mcp_config") etc. will work; locate the symbols
init_tools_config, load_mcp_config, create_provider, and get_tools to apply the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 45ce17d1-9ce1-43d5-ad14-fef29bb1520c

📥 Commits

Reviewing files that changed from the base of the PR and between c16aa43 and 5689745.

📒 Files selected for processing (5)
  • .branch-placeholder
  • dev-suite/src/api/events.py
  • dev-suite/src/api/runner.py
  • dev-suite/src/orchestrator.py
  • dev-suite/tests/test_tool_binding.py
💤 Files with no reviewable changes (1)
  • .branch-placeholder

Comment on lines 52 to +53
QA_ESCALATION = "qa_escalation"
TOOL_CALL = "tool_call"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update the dashboard SSE contract for tool_call.

This enum addition is still invisible to the browser: dashboard/src/lib/types/api.ts (Lines 172-177) excludes 'tool_call', and dashboard/src/lib/sse.ts (Lines 44-80 and 137-150) neither subscribes to nor dispatches it. The runner can publish the event, but the dashboard will drop it, so Issue #80's visibility goal stays broken. Based on learnings: Use SSE client with window event dispatch for real-time dashboard updates from backend.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/api/events.py` around lines 52 - 53, The dashboard doesn't
recognize the new QA_ESCALATION/TOOL_CALL event because the dashboard type union
and SSE wiring lack 'tool_call'; update the dashboard API types (add 'tool_call'
to the union in api type definitions) and add handling in the SSE client
(subscribe for 'tool_call' in the SSE handling code and dispatch it to the rest
of the app using the same window event pattern used for other events so the
dashboard receives real-time updates). Ensure you reference the same event
string "tool_call" in the type union, the SSE subscription switch/handler, and
the window.dispatchEvent/new CustomEvent dispatch so published runner events are
accepted and emitted to the UI.


def developer_node(state: GraphState) -> dict:
"""Lead Dev: executes the Blueprint and generates code."""
async def developer_node(state: GraphState, config: dict | None = None) -> dict:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -n 'async def developer_node|async def qa_node|config: dict \| None|workflow\.invoke\(' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 301


🏁 Script executed:

sed -n '741,760p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1701


🏁 Script executed:

head -40 dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1460


🏁 Script executed:

rg -n 'def run_task|RunnableConfig' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 214


🏁 Script executed:

rg -n 'from langchain|from langgraph|import.*RunnableConfig' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 269


🏁 Script executed:

rg -n 'run_task\(' dev-suite/src/orchestrator.py | head -20

Repository: Abernaughty/agent-dev

Length of output: 149


🏁 Script executed:

wc -l dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 98


🏁 Script executed:

sed -n '400,415p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1163


🏁 Script executed:

sed -n '520,535p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 948


🏁 Script executed:

sed -n '1,50p' dev-suite/src/orchestrator.py | tail -20

Repository: Abernaughty/agent-dev

Length of output: 700


🏁 Script executed:

sed -n '30,35p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 285


🏁 Script executed:

rg -n 'ainvoke|astream' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 134


🏁 Script executed:

python3 << 'EOF'
# Check if RunnableConfig is available in langchain_core
try:
    from langchain_core.runnables import RunnableConfig
    print("RunnableConfig is available in langchain_core.runnables")
except ImportError:
    print("RunnableConfig not found in langchain_core.runnables")
EOF

Repository: Abernaughty/agent-dev

Length of output: 117


🏁 Script executed:

python3 << 'EOF'
# Check various possible locations for RunnableConfig
locations = [
    "langchain_core.runnables",
    "langchain_core.runnables.config",
    "langgraph.types",
    "langgraph.graph",
]

for loc in locations:
    try:
        module_parts = loc.rsplit(".", 1)
        if len(module_parts) == 2:
            module, name = module_parts
            exec(f"from {module} import {name}")
        else:
            exec(f"import {loc}")
        print(f"✓ Found in {loc}")
    except ImportError:
        print(f"✗ Not in {loc}")
EOF

Repository: Abernaughty/agent-dev

Length of output: 189


🏁 Script executed:

sed -n '741,755p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1701


🏁 Script executed:

rg -n 'def create_workflow' dev-suite/src/orchestrator.py -A 5

Repository: Abernaughty/agent-dev

Length of output: 210


🏁 Script executed:

python3 << 'EOF'
# Check what version of langchain and langgraph are available
try:
    import langchain_core
    print(f"langchain_core version: {langchain_core.__version__ if hasattr(langchain_core, '__version__') else 'unknown'}")
except:
    print("langchain_core version: could not determine")

try:
    import langgraph
    print(f"langgraph version: {langgraph.__version__ if hasattr(langgraph, '__version__') else 'unknown'}")
except:
    print("langgraph version: could not determine")
EOF

Repository: Abernaughty/agent-dev

Length of output: 147


🏁 Script executed:

rg -n 'class.*Config|RunnableConfig' dev-suite/src/ --type python -A 2 | head -40

Repository: Abernaughty/agent-dev

Length of output: 95


🏁 Script executed:

rg -n 'config' dev-suite/src/orchestrator.py | grep -E 'developer_node|qa_node' -B 2 -A 2

Repository: Abernaughty/agent-dev

Length of output: 519


🏁 Script executed:

grep -r 'RunnableConfig' dev-suite/ 2>/dev/null || echo "No RunnableConfig found in codebase"

Repository: Abernaughty/agent-dev

Length of output: 231


🏁 Script executed:

sed -n '408,430p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 1317


🏁 Script executed:

sed -n '280,295p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 487


🏁 Script executed:

rg -n '_get_agent_tools' dev-suite/src/orchestrator.py -A 10 | head -20

Repository: Abernaughty/agent-dev

Length of output: 852


🏁 Script executed:

sed -n '1,80p' dev-suite/src/orchestrator.py | tail -30

Repository: Abernaughty/agent-dev

Length of output: 850


🏁 Script executed:

rg -n 'from langgraph' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 118


🏁 Script executed:

python3 << 'EOF'
# Check what methods a compiled graph has
from langgraph.graph import StateGraph
from langgraph.graph import END, START
import inspect

# Create a minimal graph to inspect its methods
graph = StateGraph(dict)
graph.add_node("test", lambda x: x)
compiled = graph.compile()

# Check for async methods
methods = [m for m in dir(compiled) if not m.startswith('_')]
print("Available methods on compiled graph:")
for method in sorted(methods):
    print(f"  {method}")

if 'ainvoke' in methods:
    print("\n✓ ainvoke available")
if 'invoke' in methods:
    print("✓ invoke available")
EOF

Repository: Abernaughty/agent-dev

Length of output: 186


🏁 Script executed:

grep -r 'run_task(' dev-suite/ --include='*.py' | grep -v '^Binary' | head -20

Repository: Abernaughty/agent-dev

Length of output: 2031


🏁 Script executed:

sed -n '715,725p' dev-suite/src/orchestrator.py

Repository: Abernaughty/agent-dev

Length of output: 275


🏁 Script executed:

python3 << 'EOF'
# Check LangGraph documentation/behavior for async graphs
# Even without langgraph installed, we can reason about the error message

error_msg = 'TypeError: No synchronous function provided to "developer"'
print("Error analysis:")
print(f"- Error suggests async node 'developer' cannot be run with sync invoke()")
print(f"- This occurs when async nodes are in the graph but invoke() is used")
print(f"- Solutions:")
print(f"  1. Make all nodes sync (revert async)")
print(f"  2. Use ainvoke() or astream() instead of invoke()")
print(f"  3. Make run_task() async to await ainvoke()")
EOF

Repository: Abernaughty/agent-dev

Length of output: 373


Switch run_task() to async invocation to support async graph nodes.

developer_node() and qa_node() are async, but run_task() still uses workflow.invoke() (sync invocation) on line 751. This causes the CI error TypeError: No synchronous function provided to "developer". LangGraph requires .ainvoke() or .astream() for graphs containing async nodes. Make run_task() async and await workflow.ainvoke() instead of calling invoke().

The config: dict | None parameter type is correct and intentional—it's used to pass tools via config["configurable"]["tools"] (see _get_agent_tools() at line 282).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/orchestrator.py` at line 408, The current run_task function
must be converted to an async function to support async graph nodes (since
developer_node and qa_node are async); change its definition to async def
run_task(...), replace the synchronous workflow.invoke(...) call with await
workflow.ainvoke(...) (or await workflow.astream(...) if streaming is intended),
and update any call sites to await run_task(...) so callers handle the
coroutine; ensure you preserve the existing use of config: dict | None and
continue passing config["configurable"]["tools"] into _get_agent_tools() as
before.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dev-suite/tests/test_e2e.py (1)

140-141: ⚠️ Potential issue | 🔴 Critical

E2E pipeline still uses synchronous invocation with async nodes—must migrate to async path

The full-pipeline E2E tests call run_task() synchronously, but developer_node (line 408) and qa_node (line 525) are now async. Invoking a mixed async/sync StateGraph with .invoke() will fail with TypeError: No synchronous function provided to "developer". Either:

  • Convert run_task() to async and await it in E2E tests, or
  • Use workflow.ainvoke() instead of workflow.invoke() at line 751.

Node-level tests are correctly converted to async with @pytest.mark.asyncio, but the full-pipeline path at lines 140–545 remains synchronous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_e2e.py` around lines 140 - 141, The E2E test calls
run_task() synchronously but developer_node and qa_node are async, so update
run_task to use the async invocation path: either make run_task an async
function and await it from tests, or modify run_task to call workflow.ainvoke()
(instead of workflow.invoke()) when executing the StateGraph so async nodes are
awaited; locate run_task, developer_node, qa_node and the workflow.invoke call
and switch to the async pattern and update the E2E test to await run_task if you
convert it to async.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@dev-suite/tests/test_e2e.py`:
- Around line 140-141: The E2E test calls run_task() synchronously but
developer_node and qa_node are async, so update run_task to use the async
invocation path: either make run_task an async function and await it from tests,
or modify run_task to call workflow.ainvoke() (instead of workflow.invoke())
when executing the StateGraph so async nodes are awaited; locate run_task,
developer_node, qa_node and the workflow.invoke call and switch to the async
pattern and update the E2E test to await run_task if you convert it to async.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f92fbd4e-dc11-4eb6-a37c-3bc7eef817d7

📥 Commits

Reviewing files that changed from the base of the PR and between 5689745 and 04c28c2.

📒 Files selected for processing (1)
  • dev-suite/tests/test_e2e.py

…ndbox_validate_node (#80)

- developer_node and qa_node are now sync (def, not async def)
- Tool loop runs via _run_async() bridge when tools present
- No-tools path stays fully synchronous (zero regression)
- Fixed sandbox_validate_node: parsed_files=None guard, WARNING trace
- Fixes LangGraph sync invoke() TypeError for async nodes
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/tests/test_e2e.py`:
- Line 1: The file dev-suite/tests/test_e2e.py currently contains an undefined
identifier RESTORE_FROM_MAIN which causes a lint/build failure; either restore
the real test file from main (replace the placeholder with the original test
logic) or delete test_e2e.py if it’s intentionally removed and you will rely on
test_e2e_live.py for E2E coverage; ensure the branch contains no stray
RESTORE_FROM_MAIN token (remove or replace it) so ruff F821 is resolved before
merging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2ae6ef23-4e9a-4fbf-a5c4-31f598093ac6

📥 Commits

Reviewing files that changed from the base of the PR and between 04c28c2 and 33969b9.

📒 Files selected for processing (1)
  • dev-suite/tests/test_e2e.py

…oad_mcp_config (#80)

- Developer/QA node tests now call nodes synchronously (no await)
- Fixed patch target: src.tools.load_mcp_config (not src.orchestrator)
- _execute_tool_call and _run_tool_loop tests remain async (they ARE async)
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
dev-suite/tests/test_e2e.py (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Replace the placeholder before merge.

Line 1 is still a bare RESTORE_FROM_MAIN token. That trips Ruff F821 and leaves this module with no runnable tests. Please either restore the real dev-suite/tests/test_e2e.py contents in this branch or delete the file if the suite is intentionally being retired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/tests/test_e2e.py` at line 1, The file contains a placeholder token
RESTORE_FROM_MAIN causing Ruff F821 and leaving no tests; replace that token by
restoring the original test module contents for dev-suite/tests/test_e2e.py
(reintroduce the actual test functions, imports, and fixtures previously
present) or, if the e2e suite is intentionally retired, remove
dev-suite/tests/test_e2e.py entirely and update any test suite config
referencing it; ensure the restored file defines runnable tests (e.g., proper
pytest test_ functions and imports) so linters and test discovery pass.
dev-suite/src/orchestrator.py (3)

239-243: ⚠️ Potential issue | 🟠 Major

Use LangChain’s public tool wrappers (ainvoke/invoke) instead of tool.coroutine.

Calling tool.coroutine(...) directly bypasses wrapper behavior and can break tool input/config handling.

Suggested fix
-        if hasattr(tool, "coroutine") and tool.coroutine:
-            result = await tool.coroutine(tool_args)
-        else:
-            result = tool.invoke(tool_args)
+        if hasattr(tool, "ainvoke"):
+            result = await tool.ainvoke(tool_args)
+        else:
+            result = tool.invoke(tool_args)
In LangChain Core 0.3.x, what is the recommended public API for async tool execution: `tool.coroutine(...)` or `await tool.ainvoke(...)`?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/orchestrator.py` around lines 239 - 243, The code is calling
tool.coroutine(...) directly which bypasses LangChain's public wrappers; change
the async branch to await the public wrapper (await tool.ainvoke(tool_args)) and
keep the sync branch using tool.invoke(tool_args), so replace references to
tool.coroutine with tool.ainvoke to preserve input/config handling for tools
(keep returning ToolMessage(content=str(result), tool_call_id=tool_id)).

214-215: ⚠️ Potential issue | 🔴 Critical

Remove github_create_pr from the iterative Dev tool allowlist.

This enables a non-idempotent external write inside a retrying tool loop and can create duplicate/partial PRs before QA/safety gates complete.

Suggested fix
-DEV_TOOL_NAMES = {"filesystem_read", "filesystem_write", "filesystem_list", "github_read_diff", "github_create_pr"}
+DEV_TOOL_NAMES = {"filesystem_read", "filesystem_write", "filesystem_list", "github_read_diff"}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/orchestrator.py` around lines 214 - 215, DEV_TOOL_NAMES
currently includes "github_create_pr", which allows a non-idempotent external
write inside a retrying loop; remove "github_create_pr" from the DEV_TOOL_NAMES
set so only safe idempotent tools remain (leave QA_TOOL_NAMES unchanged). Update
the DEV_TOOL_NAMES definition (symbol DEV_TOOL_NAMES) to exclude
"github_create_pr" and run related tests/linters to ensure no other code relies
on that tool being in the dev allowlist.

267-267: ⚠️ Potential issue | 🟠 Major

Redact tool args/results before persisting to tool_calls_log.

args_preview/result_preview can include code or secrets (e.g., file contents, diffs) and are later surfaced downstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dev-suite/src/orchestrator.py` at line 267, The current append to
tool_calls_log stores raw args/result previews which may contain secrets or
large code; change this to sanitize and redact before persisting by introducing
and using a sanitizer (e.g., sanitize_preview) when building the dict for
tool_calls_log.append: call it on tc.get("args", {}) and on tool_msg.content to
mask known sensitive keys, remove long file contents/diffs, and replace with a
fixed token or truncated summary, then truncate to 200 chars; update references
to tc and tool_msg usage in the append site so only sanitized/trimmed strings
are stored in args_preview and result_preview and mark success as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 249-270: The loop can skip entirely when max_turns <= 0 leaving
`response` undefined; in _run_tool_loop validate max_turns at the start (or
handle the case before the for-loop) and return a sensible result instead of
relying on `response` being set: append a trace entry like f"{agent_name}: tool
loop skipped due to max_turns={max_turns}", log a warning via logger, and return
the last message from the incoming `messages` (e.g., messages[-1] or a fallback
empty response), current tokens_used, and an empty tool_calls_log so callers of
_run_tool_loop never receive an undefined `response`.

---

Duplicate comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 239-243: The code is calling tool.coroutine(...) directly which
bypasses LangChain's public wrappers; change the async branch to await the
public wrapper (await tool.ainvoke(tool_args)) and keep the sync branch using
tool.invoke(tool_args), so replace references to tool.coroutine with
tool.ainvoke to preserve input/config handling for tools (keep returning
ToolMessage(content=str(result), tool_call_id=tool_id)).
- Around line 214-215: DEV_TOOL_NAMES currently includes "github_create_pr",
which allows a non-idempotent external write inside a retrying loop; remove
"github_create_pr" from the DEV_TOOL_NAMES set so only safe idempotent tools
remain (leave QA_TOOL_NAMES unchanged). Update the DEV_TOOL_NAMES definition
(symbol DEV_TOOL_NAMES) to exclude "github_create_pr" and run related
tests/linters to ensure no other code relies on that tool being in the dev
allowlist.
- Line 267: The current append to tool_calls_log stores raw args/result previews
which may contain secrets or large code; change this to sanitize and redact
before persisting by introducing and using a sanitizer (e.g., sanitize_preview)
when building the dict for tool_calls_log.append: call it on tc.get("args", {})
and on tool_msg.content to mask known sensitive keys, remove long file
contents/diffs, and replace with a fixed token or truncated summary, then
truncate to 200 chars; update references to tc and tool_msg usage in the append
site so only sanitized/trimmed strings are stored in args_preview and
result_preview and mark success as before.

In `@dev-suite/tests/test_e2e.py`:
- Line 1: The file contains a placeholder token RESTORE_FROM_MAIN causing Ruff
F821 and leaving no tests; replace that token by restoring the original test
module contents for dev-suite/tests/test_e2e.py (reintroduce the actual test
functions, imports, and fixtures previously present) or, if the e2e suite is
intentionally retired, remove dev-suite/tests/test_e2e.py entirely and update
any test suite config referencing it; ensure the restored file defines runnable
tests (e.g., proper pytest test_ functions and imports) so linters and test
discovery pass.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39f31fc9-42e8-4845-a135-5caae6db58e2

📥 Commits

Reviewing files that changed from the base of the PR and between 33969b9 and 29ad3c4.

📒 Files selected for processing (2)
  • dev-suite/src/orchestrator.py
  • dev-suite/tests/test_e2e.py

Fix 2: Remove github_create_pr from DEV_TOOL_NAMES (non-idempotent in retry loop)
Fix 3: Use ainvoke/invoke public API instead of tool.coroutine
Fix 4: Add _sanitize_preview() to redact secrets from tool_calls_log
Fix 5: Guard max_turns <= 0 in _run_tool_loop
Fix 8: Add design note comment explaining _run_async sync bridge choice
Add import re for secret pattern matching
Track _dev_tool_baselines per task_id so _handle_developer only counts
tool calls from the current developer pass, not cumulative across retries.
- FakeTool now uses ainvoke/invoke API (Fix 3), add SyncOnlyFakeTool
- test_dev_tools_filtered_to_dev_set expects github_create_pr excluded (Fix 2)
- test_dev_has_no_pr_creation: new test for Fix 2
- TestSanitizePreview: 6 new tests for Fix 4
- test_max_turns_zero_returns_immediately: new test for Fix 5
… ValidationPlan

The compacted sandbox_validate_node was calling format_validation_summary(result)
where result is a SandboxResult, but the function expects a ValidationPlan.
Reverted to direct field access: result.exit_code, result.tests_passed,
result.tests_failed — matching the original pre-PR behavior.
@Abernaughty Abernaughty merged commit 380ad12 into main Apr 1, 2026
3 checks passed
@Abernaughty Abernaughty deleted the feat/agent-tool-binding branch April 1, 2026 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Agent tool binding — give Dev and QA agents access to workspace tools

1 participant