feat: Agent tool binding — give Dev and QA agents access to workspace tools (#80)#83
feat: Agent tool binding — give Dev and QA agents access to workspace tools (#80)#83Abernaughty merged 17 commits intomainfrom
Conversation
…tools_config (#80) - Add tool_calls_log to GraphState and AgentState - Add MAX_TOOL_TURNS config (default 10) - Add DEV_TOOL_NAMES and QA_TOOL_NAMES role-based tool sets - Add _get_agent_tools() to extract tools from RunnableConfig - Add _execute_tool_call() for individual tool execution - Add _run_tool_loop() for iterative tool calling - Convert developer_node to async with tool binding support - Convert qa_node to async with read-only tool subset - Add init_tools_config() factory for workspace tool setup - Update run_task() to include tools in invoke config - Graceful fallback: no tools = original single-shot behavior
- Add tool_calls_log to GraphState and AgentState - Add MAX_TOOL_TURNS config (default 10) - Add DEV_TOOL_NAMES and QA_TOOL_NAMES role-based tool sets - Add _get_agent_tools(), _execute_tool_call(), _run_tool_loop() - Convert developer_node and qa_node to async with tool binding - Add init_tools_config() factory for workspace tool setup - Update run_task() to include tools in invoke config - Graceful fallback: no tools = original single-shot behavior
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds workspace tool support: tools are initialized and injected into workflows; Dev/QA LLMs are bound to role-filtered tools; an iterative tool-execution loop records tool calls, emits TOOL_CALL SSE events, and developer/qa nodes accept an optional config. Changes
Sequence Diagram(s)sequenceDiagram
participant TaskRunner
participant Orchestrator
participant LLM as "LLM (bound)"
participant ToolProvider
participant EventBus
TaskRunner->>Orchestrator: init_tools_config(workspace_root)
Orchestrator->>ToolProvider: load tools (mcp-config.json)
ToolProvider-->>Orchestrator: tools[]
Orchestrator-->>TaskRunner: tools_config
TaskRunner->>Orchestrator: run_task(config includes tools_config)
Orchestrator->>LLM: bind_tools(tools_for_role)
Orchestrator->>LLM: invoke(messages)
LLM-->>Orchestrator: response (may include tool_calls)
loop tool call loop (<= MAX_TOOL_TURNS)
Orchestrator->>ToolProvider: execute_tool(tool_call)
ToolProvider-->>Orchestrator: tool_result
Orchestrator->>LLM: invoke(messages + ToolMessage(tool_result))
LLM-->>Orchestrator: response (more tool_calls or final)
Orchestrator->>EventBus: publish TOOL_CALL event
EventBus-->>TaskRunner: TOOL_CALL SSE event
end
Orchestrator-->>TaskRunner: final state (includes tool_calls_log)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…a_node (#80) developer_node and qa_node are now async (issue #80). Updated 4 tests: - test_developer_node_generates_code -> async + await - test_qa_node_returns_pass -> async + await - test_qa_node_returns_fail -> async + await - test_qa_node_returns_escalate -> async + await Pipeline tests (run_task) remain sync -- LangGraph handles async nodes internally via workflow.invoke().
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-suite/src/api/events.py`:
- Around line 52-53: The dashboard doesn't recognize the new
QA_ESCALATION/TOOL_CALL event because the dashboard type union and SSE wiring
lack 'tool_call'; update the dashboard API types (add 'tool_call' to the union
in api type definitions) and add handling in the SSE client (subscribe for
'tool_call' in the SSE handling code and dispatch it to the rest of the app
using the same window event pattern used for other events so the dashboard
receives real-time updates). Ensure you reference the same event string
"tool_call" in the type union, the SSE subscription switch/handler, and the
window.dispatchEvent/new CustomEvent dispatch so published runner events are
accepted and emitted to the UI.
In `@dev-suite/src/api/runner.py`:
- Around line 337-350: The current logic uses the cumulative tool_calls_log to
compute dev_tool_calls, causing counts to include previous retries; to fix,
record the baseline index/length of output["tool_calls_log"] at the start of the
developer pass (e.g., save a variable like dev_pass_tool_calls_start =
len(output.get("tool_calls_log", [])) before generating code), and when
computing dev_tool_calls replace the filter with only the new slice
output["tool_calls_log"][dev_pass_tool_calls_start:] (then filter by
tc.get("agent") == "developer") so the action string uses only tool calls from
the current developer pass; update any references in the code block that builds
action/event_type/retry to use this per-pass dev_tool_calls.
In `@dev-suite/src/orchestrator.py`:
- Around line 619-623: The call site in orchestrator.py is dereferencing
plan.timeout which raises AttributeError because ValidationPlan lacks a timeout
attribute; fix by either adding a timeout field/default to the ValidationPlan
class or stop passing plan.timeout here and let _run_sandbox_validation() use
its own default. Locate the ValidationPlan definition (class ValidationPlan) and
add an optional timeout attribute with a reasonable default, or remove the
timeout argument from the _run_sandbox_validation(...) call in the
sandbox_validate path so the helper's internal default is used; update any
callers/tests accordingly to keep behavior consistent.
- Around line 345-350: tool_calls_log currently saves raw previews in
args_preview and result_preview which can leak file contents or secrets (e.g.,
from filesystem_write, filesystem_read, github_read_diff) into GraphState and
TOOL_CALL events; instead, sanitize and persist only redacted metadata: for
args_preview derive and store non-sensitive fields such as filenames, sizes,
content hashes or a fixed "[REDACTED]" tag when any arg key indicates
file/content, and for result_preview detect large/plaintext content and replace
with metadata like "type":"file"|"text", "size":N, "hash":H or "[REDACTED]" if
content appears to contain source/secrets, updating the tool_calls_log
population logic to call a redaction helper (e.g., redact_tool_preview) before
assigning args_preview/result_preview so no raw content is written to GraphState
or emitted in TOOL_CALL.
- Around line 268-274: DEV_TOOL_NAMES currently includes "github_create_pr",
which allows a non-idempotent external write inside retrying LLM loops; remove
"github_create_pr" from the DEV_TOOL_NAMES set to prevent automatic PR creation
during iterative tool runs, and instead move PR creation to an explicit
post-review flow or wrap it behind an approval/confirmation guard; update any
code paths or tests that reference DEV_TOOL_NAMES to account for its removal and
ensure PR creation is only invoked from the designated
post-review/human-approved function.
- Around line 305-310: The current code calls the tool's internal coroutine or
invoke directly (checking tool.coroutine and calling tool.coroutine(tool_args))
which bypasses the tool wrapper and can break tools expecting unpacked kwargs;
update the call path to use the public async wrapper await
tool.ainvoke(tool_args) when awaiting tools, falling back to
tool.invoke(tool_args) only for sync tools, and then return
ToolMessage(content=str(result), tool_call_id=tool_id); specifically replace
references to tool.coroutine and direct coroutine invocation with await
tool.ainvoke(tool_args) while keeping tool.invoke(tool_args) for synchronous
calls so the tool wrapper/input handling remains intact.
- Line 408: The current run_task function must be converted to an async function
to support async graph nodes (since developer_node and qa_node are async);
change its definition to async def run_task(...), replace the synchronous
workflow.invoke(...) call with await workflow.ainvoke(...) (or await
workflow.astream(...) if streaming is intended), and update any call sites to
await run_task(...) so callers handle the coroutine; ensure you preserve the
existing use of config: dict | None and continue passing
config["configurable"]["tools"] into _get_agent_tools() as before.
In `@dev-suite/tests/test_tool_binding.py`:
- Around line 340-341: The test patches the names in src.orchestrator but
init_tools_config imports load_mcp_config, create_provider and get_tools inside
the function, so those patches miss; update the test to patch the actual
defining module (patch "src.tools.load_mcp_config", "src.tools.create_provider",
and "src.tools.get_tools" or the correct modules where those functions are
implemented) or alternatively refactor init_tools_config to hoist the imports so
patch("src.orchestrator.load_mcp_config") etc. will work; locate the symbols
init_tools_config, load_mcp_config, create_provider, and get_tools to apply the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 45ce17d1-9ce1-43d5-ad14-fef29bb1520c
📒 Files selected for processing (5)
.branch-placeholderdev-suite/src/api/events.pydev-suite/src/api/runner.pydev-suite/src/orchestrator.pydev-suite/tests/test_tool_binding.py
💤 Files with no reviewable changes (1)
- .branch-placeholder
| QA_ESCALATION = "qa_escalation" | ||
| TOOL_CALL = "tool_call" |
There was a problem hiding this comment.
Update the dashboard SSE contract for tool_call.
This enum addition is still invisible to the browser: dashboard/src/lib/types/api.ts (Lines 172-177) excludes 'tool_call', and dashboard/src/lib/sse.ts (Lines 44-80 and 137-150) neither subscribes to nor dispatches it. The runner can publish the event, but the dashboard will drop it, so Issue #80's visibility goal stays broken. Based on learnings: Use SSE client with window event dispatch for real-time dashboard updates from backend.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dev-suite/src/api/events.py` around lines 52 - 53, The dashboard doesn't
recognize the new QA_ESCALATION/TOOL_CALL event because the dashboard type union
and SSE wiring lack 'tool_call'; update the dashboard API types (add 'tool_call'
to the union in api type definitions) and add handling in the SSE client
(subscribe for 'tool_call' in the SSE handling code and dispatch it to the rest
of the app using the same window event pattern used for other events so the
dashboard receives real-time updates). Ensure you reference the same event
string "tool_call" in the type union, the SSE subscription switch/handler, and
the window.dispatchEvent/new CustomEvent dispatch so published runner events are
accepted and emitted to the UI.
dev-suite/src/orchestrator.py
Outdated
|
|
||
| def developer_node(state: GraphState) -> dict: | ||
| """Lead Dev: executes the Blueprint and generates code.""" | ||
| async def developer_node(state: GraphState, config: dict | None = None) -> dict: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'async def developer_node|async def qa_node|config: dict \| None|workflow\.invoke\(' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 301
🏁 Script executed:
sed -n '741,760p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 1701
🏁 Script executed:
head -40 dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 1460
🏁 Script executed:
rg -n 'def run_task|RunnableConfig' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 214
🏁 Script executed:
rg -n 'from langchain|from langgraph|import.*RunnableConfig' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 269
🏁 Script executed:
rg -n 'run_task\(' dev-suite/src/orchestrator.py | head -20Repository: Abernaughty/agent-dev
Length of output: 149
🏁 Script executed:
wc -l dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 98
🏁 Script executed:
sed -n '400,415p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 1163
🏁 Script executed:
sed -n '520,535p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 948
🏁 Script executed:
sed -n '1,50p' dev-suite/src/orchestrator.py | tail -20Repository: Abernaughty/agent-dev
Length of output: 700
🏁 Script executed:
sed -n '30,35p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 285
🏁 Script executed:
rg -n 'ainvoke|astream' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 134
🏁 Script executed:
python3 << 'EOF'
# Check if RunnableConfig is available in langchain_core
try:
from langchain_core.runnables import RunnableConfig
print("RunnableConfig is available in langchain_core.runnables")
except ImportError:
print("RunnableConfig not found in langchain_core.runnables")
EOFRepository: Abernaughty/agent-dev
Length of output: 117
🏁 Script executed:
python3 << 'EOF'
# Check various possible locations for RunnableConfig
locations = [
"langchain_core.runnables",
"langchain_core.runnables.config",
"langgraph.types",
"langgraph.graph",
]
for loc in locations:
try:
module_parts = loc.rsplit(".", 1)
if len(module_parts) == 2:
module, name = module_parts
exec(f"from {module} import {name}")
else:
exec(f"import {loc}")
print(f"✓ Found in {loc}")
except ImportError:
print(f"✗ Not in {loc}")
EOFRepository: Abernaughty/agent-dev
Length of output: 189
🏁 Script executed:
sed -n '741,755p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 1701
🏁 Script executed:
rg -n 'def create_workflow' dev-suite/src/orchestrator.py -A 5Repository: Abernaughty/agent-dev
Length of output: 210
🏁 Script executed:
python3 << 'EOF'
# Check what version of langchain and langgraph are available
try:
import langchain_core
print(f"langchain_core version: {langchain_core.__version__ if hasattr(langchain_core, '__version__') else 'unknown'}")
except:
print("langchain_core version: could not determine")
try:
import langgraph
print(f"langgraph version: {langgraph.__version__ if hasattr(langgraph, '__version__') else 'unknown'}")
except:
print("langgraph version: could not determine")
EOFRepository: Abernaughty/agent-dev
Length of output: 147
🏁 Script executed:
rg -n 'class.*Config|RunnableConfig' dev-suite/src/ --type python -A 2 | head -40Repository: Abernaughty/agent-dev
Length of output: 95
🏁 Script executed:
rg -n 'config' dev-suite/src/orchestrator.py | grep -E 'developer_node|qa_node' -B 2 -A 2Repository: Abernaughty/agent-dev
Length of output: 519
🏁 Script executed:
grep -r 'RunnableConfig' dev-suite/ 2>/dev/null || echo "No RunnableConfig found in codebase"Repository: Abernaughty/agent-dev
Length of output: 231
🏁 Script executed:
sed -n '408,430p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 1317
🏁 Script executed:
sed -n '280,295p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 487
🏁 Script executed:
rg -n '_get_agent_tools' dev-suite/src/orchestrator.py -A 10 | head -20Repository: Abernaughty/agent-dev
Length of output: 852
🏁 Script executed:
sed -n '1,80p' dev-suite/src/orchestrator.py | tail -30Repository: Abernaughty/agent-dev
Length of output: 850
🏁 Script executed:
rg -n 'from langgraph' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 118
🏁 Script executed:
python3 << 'EOF'
# Check what methods a compiled graph has
from langgraph.graph import StateGraph
from langgraph.graph import END, START
import inspect
# Create a minimal graph to inspect its methods
graph = StateGraph(dict)
graph.add_node("test", lambda x: x)
compiled = graph.compile()
# Check for async methods
methods = [m for m in dir(compiled) if not m.startswith('_')]
print("Available methods on compiled graph:")
for method in sorted(methods):
print(f" {method}")
if 'ainvoke' in methods:
print("\n✓ ainvoke available")
if 'invoke' in methods:
print("✓ invoke available")
EOFRepository: Abernaughty/agent-dev
Length of output: 186
🏁 Script executed:
grep -r 'run_task(' dev-suite/ --include='*.py' | grep -v '^Binary' | head -20Repository: Abernaughty/agent-dev
Length of output: 2031
🏁 Script executed:
sed -n '715,725p' dev-suite/src/orchestrator.pyRepository: Abernaughty/agent-dev
Length of output: 275
🏁 Script executed:
python3 << 'EOF'
# Check LangGraph documentation/behavior for async graphs
# Even without langgraph installed, we can reason about the error message
error_msg = 'TypeError: No synchronous function provided to "developer"'
print("Error analysis:")
print(f"- Error suggests async node 'developer' cannot be run with sync invoke()")
print(f"- This occurs when async nodes are in the graph but invoke() is used")
print(f"- Solutions:")
print(f" 1. Make all nodes sync (revert async)")
print(f" 2. Use ainvoke() or astream() instead of invoke()")
print(f" 3. Make run_task() async to await ainvoke()")
EOFRepository: Abernaughty/agent-dev
Length of output: 373
Switch run_task() to async invocation to support async graph nodes.
developer_node() and qa_node() are async, but run_task() still uses workflow.invoke() (sync invocation) on line 751. This causes the CI error TypeError: No synchronous function provided to "developer". LangGraph requires .ainvoke() or .astream() for graphs containing async nodes. Make run_task() async and await workflow.ainvoke() instead of calling invoke().
The config: dict | None parameter type is correct and intentional—it's used to pass tools via config["configurable"]["tools"] (see _get_agent_tools() at line 282).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dev-suite/src/orchestrator.py` at line 408, The current run_task function
must be converted to an async function to support async graph nodes (since
developer_node and qa_node are async); change its definition to async def
run_task(...), replace the synchronous workflow.invoke(...) call with await
workflow.ainvoke(...) (or await workflow.astream(...) if streaming is intended),
and update any call sites to await run_task(...) so callers handle the
coroutine; ensure you preserve the existing use of config: dict | None and
continue passing config["configurable"]["tools"] into _get_agent_tools() as
before.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
dev-suite/tests/test_e2e.py (1)
140-141:⚠️ Potential issue | 🔴 CriticalE2E pipeline still uses synchronous invocation with async nodes—must migrate to async path
The full-pipeline E2E tests call
run_task()synchronously, butdeveloper_node(line 408) andqa_node(line 525) are now async. Invoking a mixed async/sync StateGraph with.invoke()will fail withTypeError: No synchronous function provided to "developer". Either:
- Convert
run_task()to async and await it in E2E tests, or- Use
workflow.ainvoke()instead ofworkflow.invoke()at line 751.Node-level tests are correctly converted to async with
@pytest.mark.asyncio, but the full-pipeline path at lines 140–545 remains synchronous.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/tests/test_e2e.py` around lines 140 - 141, The E2E test calls run_task() synchronously but developer_node and qa_node are async, so update run_task to use the async invocation path: either make run_task an async function and await it from tests, or modify run_task to call workflow.ainvoke() (instead of workflow.invoke()) when executing the StateGraph so async nodes are awaited; locate run_task, developer_node, qa_node and the workflow.invoke call and switch to the async pattern and update the E2E test to await run_task if you convert it to async.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@dev-suite/tests/test_e2e.py`:
- Around line 140-141: The E2E test calls run_task() synchronously but
developer_node and qa_node are async, so update run_task to use the async
invocation path: either make run_task an async function and await it from tests,
or modify run_task to call workflow.ainvoke() (instead of workflow.invoke())
when executing the StateGraph so async nodes are awaited; locate run_task,
developer_node, qa_node and the workflow.invoke call and switch to the async
pattern and update the E2E test to await run_task if you convert it to async.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f92fbd4e-dc11-4eb6-a37c-3bc7eef817d7
📒 Files selected for processing (1)
dev-suite/tests/test_e2e.py
…ndbox_validate_node (#80) - developer_node and qa_node are now sync (def, not async def) - Tool loop runs via _run_async() bridge when tools present - No-tools path stays fully synchronous (zero regression) - Fixed sandbox_validate_node: parsed_files=None guard, WARNING trace - Fixes LangGraph sync invoke() TypeError for async nodes
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-suite/tests/test_e2e.py`:
- Line 1: The file dev-suite/tests/test_e2e.py currently contains an undefined
identifier RESTORE_FROM_MAIN which causes a lint/build failure; either restore
the real test file from main (replace the placeholder with the original test
logic) or delete test_e2e.py if it’s intentionally removed and you will rely on
test_e2e_live.py for E2E coverage; ensure the branch contains no stray
RESTORE_FROM_MAIN token (remove or replace it) so ruff F821 is resolved before
merging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2ae6ef23-4e9a-4fbf-a5c4-31f598093ac6
📒 Files selected for processing (1)
dev-suite/tests/test_e2e.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
dev-suite/tests/test_e2e.py (1)
1-1:⚠️ Potential issue | 🔴 CriticalReplace the placeholder before merge.
Line 1 is still a bare
RESTORE_FROM_MAINtoken. That trips Ruff F821 and leaves this module with no runnable tests. Please either restore the realdev-suite/tests/test_e2e.pycontents in this branch or delete the file if the suite is intentionally being retired.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/tests/test_e2e.py` at line 1, The file contains a placeholder token RESTORE_FROM_MAIN causing Ruff F821 and leaving no tests; replace that token by restoring the original test module contents for dev-suite/tests/test_e2e.py (reintroduce the actual test functions, imports, and fixtures previously present) or, if the e2e suite is intentionally retired, remove dev-suite/tests/test_e2e.py entirely and update any test suite config referencing it; ensure the restored file defines runnable tests (e.g., proper pytest test_ functions and imports) so linters and test discovery pass.dev-suite/src/orchestrator.py (3)
239-243:⚠️ Potential issue | 🟠 MajorUse LangChain’s public tool wrappers (
ainvoke/invoke) instead oftool.coroutine.Calling
tool.coroutine(...)directly bypasses wrapper behavior and can break tool input/config handling.Suggested fix
- if hasattr(tool, "coroutine") and tool.coroutine: - result = await tool.coroutine(tool_args) - else: - result = tool.invoke(tool_args) + if hasattr(tool, "ainvoke"): + result = await tool.ainvoke(tool_args) + else: + result = tool.invoke(tool_args)In LangChain Core 0.3.x, what is the recommended public API for async tool execution: `tool.coroutine(...)` or `await tool.ainvoke(...)`?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/orchestrator.py` around lines 239 - 243, The code is calling tool.coroutine(...) directly which bypasses LangChain's public wrappers; change the async branch to await the public wrapper (await tool.ainvoke(tool_args)) and keep the sync branch using tool.invoke(tool_args), so replace references to tool.coroutine with tool.ainvoke to preserve input/config handling for tools (keep returning ToolMessage(content=str(result), tool_call_id=tool_id)).
214-215:⚠️ Potential issue | 🔴 CriticalRemove
github_create_prfrom the iterative Dev tool allowlist.This enables a non-idempotent external write inside a retrying tool loop and can create duplicate/partial PRs before QA/safety gates complete.
Suggested fix
-DEV_TOOL_NAMES = {"filesystem_read", "filesystem_write", "filesystem_list", "github_read_diff", "github_create_pr"} +DEV_TOOL_NAMES = {"filesystem_read", "filesystem_write", "filesystem_list", "github_read_diff"}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/orchestrator.py` around lines 214 - 215, DEV_TOOL_NAMES currently includes "github_create_pr", which allows a non-idempotent external write inside a retrying loop; remove "github_create_pr" from the DEV_TOOL_NAMES set so only safe idempotent tools remain (leave QA_TOOL_NAMES unchanged). Update the DEV_TOOL_NAMES definition (symbol DEV_TOOL_NAMES) to exclude "github_create_pr" and run related tests/linters to ensure no other code relies on that tool being in the dev allowlist.
267-267:⚠️ Potential issue | 🟠 MajorRedact tool args/results before persisting to
tool_calls_log.
args_preview/result_previewcan include code or secrets (e.g., file contents, diffs) and are later surfaced downstream.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/orchestrator.py` at line 267, The current append to tool_calls_log stores raw args/result previews which may contain secrets or large code; change this to sanitize and redact before persisting by introducing and using a sanitizer (e.g., sanitize_preview) when building the dict for tool_calls_log.append: call it on tc.get("args", {}) and on tool_msg.content to mask known sensitive keys, remove long file contents/diffs, and replace with a fixed token or truncated summary, then truncate to 200 chars; update references to tc and tool_msg usage in the append site so only sanitized/trimmed strings are stored in args_preview and result_preview and mark success as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 249-270: The loop can skip entirely when max_turns <= 0 leaving
`response` undefined; in _run_tool_loop validate max_turns at the start (or
handle the case before the for-loop) and return a sensible result instead of
relying on `response` being set: append a trace entry like f"{agent_name}: tool
loop skipped due to max_turns={max_turns}", log a warning via logger, and return
the last message from the incoming `messages` (e.g., messages[-1] or a fallback
empty response), current tokens_used, and an empty tool_calls_log so callers of
_run_tool_loop never receive an undefined `response`.
---
Duplicate comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 239-243: The code is calling tool.coroutine(...) directly which
bypasses LangChain's public wrappers; change the async branch to await the
public wrapper (await tool.ainvoke(tool_args)) and keep the sync branch using
tool.invoke(tool_args), so replace references to tool.coroutine with
tool.ainvoke to preserve input/config handling for tools (keep returning
ToolMessage(content=str(result), tool_call_id=tool_id)).
- Around line 214-215: DEV_TOOL_NAMES currently includes "github_create_pr",
which allows a non-idempotent external write inside a retrying loop; remove
"github_create_pr" from the DEV_TOOL_NAMES set so only safe idempotent tools
remain (leave QA_TOOL_NAMES unchanged). Update the DEV_TOOL_NAMES definition
(symbol DEV_TOOL_NAMES) to exclude "github_create_pr" and run related
tests/linters to ensure no other code relies on that tool being in the dev
allowlist.
- Line 267: The current append to tool_calls_log stores raw args/result previews
which may contain secrets or large code; change this to sanitize and redact
before persisting by introducing and using a sanitizer (e.g., sanitize_preview)
when building the dict for tool_calls_log.append: call it on tc.get("args", {})
and on tool_msg.content to mask known sensitive keys, remove long file
contents/diffs, and replace with a fixed token or truncated summary, then
truncate to 200 chars; update references to tc and tool_msg usage in the append
site so only sanitized/trimmed strings are stored in args_preview and
result_preview and mark success as before.
In `@dev-suite/tests/test_e2e.py`:
- Line 1: The file contains a placeholder token RESTORE_FROM_MAIN causing Ruff
F821 and leaving no tests; replace that token by restoring the original test
module contents for dev-suite/tests/test_e2e.py (reintroduce the actual test
functions, imports, and fixtures previously present) or, if the e2e suite is
intentionally retired, remove dev-suite/tests/test_e2e.py entirely and update
any test suite config referencing it; ensure the restored file defines runnable
tests (e.g., proper pytest test_ functions and imports) so linters and test
discovery pass.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 39f31fc9-42e8-4845-a135-5caae6db58e2
📒 Files selected for processing (2)
dev-suite/src/orchestrator.pydev-suite/tests/test_e2e.py
Fix 2: Remove github_create_pr from DEV_TOOL_NAMES (non-idempotent in retry loop) Fix 3: Use ainvoke/invoke public API instead of tool.coroutine Fix 4: Add _sanitize_preview() to redact secrets from tool_calls_log Fix 5: Guard max_turns <= 0 in _run_tool_loop Fix 8: Add design note comment explaining _run_async sync bridge choice Add import re for secret pattern matching
Track _dev_tool_baselines per task_id so _handle_developer only counts tool calls from the current developer pass, not cumulative across retries.
- FakeTool now uses ainvoke/invoke API (Fix 3), add SyncOnlyFakeTool - test_dev_tools_filtered_to_dev_set expects github_create_pr excluded (Fix 2) - test_dev_has_no_pr_creation: new test for Fix 2 - TestSanitizePreview: 6 new tests for Fix 4 - test_max_turns_zero_returns_immediately: new test for Fix 5
… ValidationPlan The compacted sandbox_validate_node was calling format_validation_summary(result) where result is a SandboxResult, but the function expects a ValidationPlan. Reverted to direct field access: result.exit_code, result.tests_passed, result.tests_failed — matching the original pre-PR behavior.
Summary
Closes #80. Gives the Dev and QA agents the ability to use tools (read files, write files, run commands) during their turn, transforming them from single-shot text generators into iterative, tool-using agents.
What Changed
orchestrator.py(Major)tool_calls_logadded toGraphStateandAgentStatefor tracking tool usageMAX_TOOL_TURNSconfig (default 10, envMAX_TOOL_TURNS) caps tool iterationsDEV_TOOL_NAMES/QA_TOOL_NAMES— role-based tool sets. Dev gets read+write, QA gets read-only_get_agent_tools(config, allowed_names)— extracts tools fromRunnableConfig["configurable"]["tools"]_execute_tool_call()— executes a single tool call, prefers async coroutine over sync_run_tool_loop()— iterative loop: LLM -> tool_calls -> execute -> feed results -> repeat until done or max_turnsdeveloper_node→ nowasync, acceptsconfigparam. If tools available:bind_tools()+ tool loop. If not: original single-shot behavior (zero regression)qa_node→ nowasync, acceptsconfigparam. Gets read-only tool subset (max 5 turns). Same fallback patterninit_tools_config()— factory that creates a ToolProvider, loads tools frommcp-config.json, returns a RunnableConfig-compatible dictrun_task()— now initializes tools config and merges into invoke configevents.py(Minor)TOOL_CALL = "tool_call"toEventTypeenumrunner.py(Medium)init_tools_configfrom orchestratorTOOL_CALLSSE events for each new tool call during streaming_handle_developerincludes tool call count in timeline action_emit_tool_call()methodtest_tool_binding.py(New — 35 tests)_get_agent_tools: 8 tests (empty config, filtering, role subsets)_execute_tool_call: 5 tests (success, unknown tool, exception, async/sync)_run_tool_loop: 5 tests (no calls, single turn, max turns, token accumulation, failure)developer_nodewith tools: 5 tests (bind_tools, single-shot, no config, tool logging, filtering)qa_nodewith tools: 2 tests (read-only subset, without tools)init_tools_config: 3 tests (no file, invalid JSON, valid config)Design Decisions
Testing
All files pass
ast.parse()syntax validation. 35 unit tests with full mock coverage.Summary by CodeRabbit
New Features
Improvements
Tests