fix(sandbox): validation strategy dispatch, sequential execution, run_script (#95)#98
Conversation
) - Add ValidationStrategy enum (TEST_SUITE, SCRIPT_EXEC, LINT_ONLY, SKIP) - Add strategy, script_file fields to ValidationPlan - Detect test files via TEST_INDICATORS to select strategy - SCRIPT_EXEC for single scripts without test suites - Ruff guard: `which ruff` check before running lint - Split PYTHON_COMMANDS into PYTHON_LINT_COMMANDS + PYTHON_TEST_COMMANDS - Updated tests for new strategy selection logic - Updated sandbox_validate_node tests for strategy dispatch
… warnings (#95) - SandboxResult: add validation_skipped and warnings fields - run_tests(): sequential command execution (drop && chain) Each command runs independently, aggregated results. Missing tools (ruff) no longer block subsequent commands (pytest). - run_script(): new method for SCRIPT_EXEC strategy Executes `python <file>`, checks exit code + stdout. Reports as 1 passed/1 failed test for QA context. - Parse __RESULTS__ JSON for per-command exit codes - Detect rc=127 (command not found) and surface as warnings
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a ValidationStrategy-driven sandbox validation flow (SKIP, SCRIPT_EXEC, LINT_ONLY, TEST_SUITE); orchestrator dispatches to Changes
Sequence Diagram(s)sequenceDiagram
participant Orchestrator as Orchestrator
participant ValidationCommands as ValidationCommands
participant E2BRunner as E2BRunner
participant Sandbox as Sandbox
Orchestrator->>ValidationCommands: get_validation_plan(target_files)
ValidationCommands-->>Orchestrator: plan { strategy, commands?, script_file? }
alt plan.strategy == SCRIPT_EXEC
Orchestrator->>E2BRunner: run_script(script_file, project_files, timeout)
E2BRunner->>Sandbox: write files + exec ["python", script_file]
Sandbox-->>E2BRunner: exit_code + output
E2BRunner-->>Orchestrator: SandboxResult(exit_code, tests_passed/failed, warnings)
else plan.strategy == TEST_SUITE
Orchestrator->>E2BRunner: run_tests(commands=[...], template?)
loop For each command
E2BRunner->>Sandbox: exec command (shell)
Sandbox-->>E2BRunner: rc + output
end
E2BRunner-->>Orchestrator: SandboxResult(exit_code, warnings)
else plan.strategy == SKIP
Orchestrator-->>Orchestrator: produce SandboxResult(validation_skipped=true, exit_code=0, output_summary)
end
Orchestrator->>Orchestrator: append warnings to trace & store result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 599-601: The sandbox validation branch changed contract by
returning SandboxResult(validation_skipped=True) for SKIP/missing-key paths but
callers (qa_node and dev-suite/src/api/runner.py around runner.py:298-318) still
expect None for "no run", so revert those branches in sandbox_validate (the
block handling plan.strategy == ValidationStrategy.SKIP and the similar block at
lines ~630-633) to return None instead of a SandboxResult; keep
SandboxResult(...) only for actual runs so existing callers that check
"sandbox_result is not None" / "exit_code == 0" continue to behave correctly.
In `@dev-suite/src/sandbox/e2b_runner.py`:
- Around line 344-365: run_tests() currently uses the wrapper process exit code
instead of aggregating child command exit codes embedded in the "__RESULTS__"
JSON trailer, and _parse_test_output() reads the merged log without extracting
that trailer; update the execution and parsing flow so self.run() preserves the
full stdout (including the "__RESULTS__..."), then modify _parse_test_output()
to locate and json.parse the "__RESULTS__" trailer produced by run_code, compute
an overall exit_code (e.g., fail if any per-command rc != 0, or pick first
non-zero) and set result.exit_code to that aggregate before returning; reference
symbols: run_tests(), self.run(), _parse_test_output(), output_summary, and the
"__RESULTS__" trailer to guide where to change behavior.
- Around line 370-376: The outer sandbox timeout passed to self.run (currently
timeout + 30) is too small for sequential child commands; change it to account
for the number of sequential commands so the outer timeout >= sum of per-command
timeouts plus slack. Compute the total expected timeout (for example
total_timeout = timeout * max(1, num_commands) + 30 or derive num_commands from
full_code/plan) and pass total_timeout instead of timeout + 30 to self.run
(retain profile=SandboxProfile.LOCKED and the same env_vars/template
parameters).
In `@dev-suite/src/sandbox/validation_commands.py`:
- Around line 90-121: The tests detection currently uses substring checks
(indicator in lower) in _has_test_files and _get_primary_script which falsely
flag names like latest_utils.py; update both functions to match test indicators
on filename/path boundaries instead—e.g., evaluate against the lowercase
basename and directory components, using startswith for prefixes like "test_"
and exact name matches like "tests" or suffix checks like "_test" (or use a
word-boundary regex) against TEST_INDICATORS so only true test files/dirs are
detected while leaving normal modules (e.g., latest_utils.py or
contest_helper.py) unaffected.
- Around line 75-77: The pytest invocation in PYTHON_TEST_COMMANDS is hardcoded
to "tests/" which mismatches _has_test_files() detection and can miss or fail to
run discovered tests; update the PYTHON_TEST_COMMANDS entry to remove the
explicit "tests/" path so pytest will auto-discover tests project-wide (e.g.,
use "python -m pytest -v --tb=short"), ensuring PYTHON_TEST_COMMANDS is the
updated string and leaving _has_test_files() logic unchanged.
- Around line 71-73: The current PYTHON_LINT_COMMANDS entry uses the shell chain
"which ruff && ruff check ... || echo ..." which masks ruff failures by turning
non-zero ruff exits into zero via the trailing ||; replace that single-line
chain with a shell if-then-else construct so ruff's exit code is preserved: test
for ruff with "if which ruff >/dev/null 2>&1; then ruff check . --select=E,F,W
--no-fix; else echo '[WARN] ruff not available -- lint skipped'; fi". Update the
PYTHON_LINT_COMMANDS list to use this if/then/else command string so validation
failures from ruff propagate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 398d9b69-e125-411d-890a-7ef79c230846
📒 Files selected for processing (5)
dev-suite/src/orchestrator.pydev-suite/src/sandbox/e2b_runner.pydev-suite/src/sandbox/validation_commands.pydev-suite/tests/test_sandbox_validate.pydev-suite/tests/test_validation_commands.py
CR #1 (Critical): Aggregate child exit codes from __RESULTS__ trailer - run_tests() now computes aggregate_exit_code from per-command rc values - Non-zero child exits properly propagate instead of being masked by wrapper CR #2 (Critical): Scale outer timeout for sequential commands - overall_timeout = max(len(commands), 1) * timeout + 30 CR #3 (Major): Will be fixed in orchestrator.py (pushed locally) - SKIP/missing-key paths revert to returning None CR #4 (Major): Fix ruff guard to preserve exit code - Replaced A && B || C with if/then/else construct - command -v ruff instead of which ruff CR #5 (Major): Drop hardcoded tests/ from pytest - python -m pytest -v --tb=short (auto-discover) CR #6 (Major): Boundary-based test file detection - _is_test_file() matches on basename boundaries - Avoids false positives like latest_utils.py, contest_helper.py - Added 6 regression tests for boundary matching
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
dev-suite/tests/test_validation_commands.py (1)
184-188: Test name is misleading—doesn't actually testscript_filepreference.The docstring claims to test that
_get_primary_scriptprefers non-test files, but sincetest_main.pyis detected as a test file, the strategy becomesTEST_SUITE(line 188). This test never exercises thescript_fileselection logic.To test
_get_primary_script's preference behavior, you'd need aSCRIPT_EXECscenario with multiple candidate files (none of which are tests).Suggested rename and additional test
- def test_script_file_prefers_non_test(self): - """_get_primary_script should prefer non-test files.""" + def test_test_file_presence_triggers_test_suite(self): + """Presence of test file alongside main file triggers TEST_SUITE.""" files = ["src/main.py", "test_main.py"] plan = get_validation_plan(files) assert plan.strategy == ValidationStrategy.TEST_SUITE + + def test_script_file_selects_first_py_file(self): + """When multiple non-test .py files exist, script_file is the first one.""" + files = ["src/secondary.py", "src/main.py"] + plan = get_validation_plan(files) + assert plan.strategy == ValidationStrategy.SCRIPT_EXEC + assert plan.script_file == "src/secondary.py"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/tests/test_validation_commands.py` around lines 184 - 188, The test test_script_file_prefers_non_test currently asserts ValidationStrategy.TEST_SUITE and never exercises _get_primary_script's preference logic; update the test to create a SCRIPT_EXEC scenario (e.g., use multiple non-test candidate files like "src/main.py" and "src/other.py" and call get_validation_plan) and assert the returned plan.strategy is ValidationStrategy.SCRIPT_EXEC and that the chosen primary script (via whatever field on the plan references the primary script) is the non-test preferred file; alternatively, rename the test to reflect it actually verifies test-suite detection and add a new test that calls get_validation_plan with multiple non-test files to verify _get_primary_script prefers the correct script.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-suite/tests/test_validation_commands.py`:
- Around line 66-69: The _get_primary_script() logic currently uses
PYTHON_EXTENSIONS = {".py", ".pyi"} which incorrectly treats .pyi stubs as
executable; change the selection to only consider code extensions (e.g., replace
or split PYTHON_EXTENSIONS into CODE_EXTENSIONS = {".py"} and STUB_EXTENSIONS =
{".pyi"} or simply make _get_primary_script() use {".py"}), update the docstring
to reflect it returns only .py files, and ensure any callers that used
PYTHON_EXTENSIONS for execution (e.g., code that sets
ValidationStrategy.SCRIPT_EXEC) now use the code-only set so .pyi files are not
chosen as primary scripts.
---
Nitpick comments:
In `@dev-suite/tests/test_validation_commands.py`:
- Around line 184-188: The test test_script_file_prefers_non_test currently
asserts ValidationStrategy.TEST_SUITE and never exercises _get_primary_script's
preference logic; update the test to create a SCRIPT_EXEC scenario (e.g., use
multiple non-test candidate files like "src/main.py" and "src/other.py" and call
get_validation_plan) and assert the returned plan.strategy is
ValidationStrategy.SCRIPT_EXEC and that the chosen primary script (via whatever
field on the plan references the primary script) is the non-test preferred file;
alternatively, rename the test to reflect it actually verifies test-suite
detection and add a new test that calls get_validation_plan with multiple
non-test files to verify _get_primary_script prefers the correct script.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a118dde8-4783-42d4-af3e-a0a4ae0df524
📒 Files selected for processing (2)
dev-suite/src/sandbox/validation_commands.pydev-suite/tests/test_validation_commands.py
🚧 Files skipped from review as they are similar to previous changes (1)
- dev-suite/src/sandbox/validation_commands.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 612-628: The dispatch in sandbox_validate misses handling
ValidationStrategy.LINT_ONLY so lint-only plans fall through to the skip branch;
add an explicit branch for plan.strategy == ValidationStrategy.LINT_ONLY that
treats it like TEST_SUITE by calling _run_sandbox_tests with plan.commands,
template, and parsed_files (same parameters as the TEST_SUITE branch) and append
an appropriate trace message (e.g., "sandbox_validate: running lint-only
{len(plan.commands)} commands"); ensure the existing TEST_SUITE branch remains
unchanged and that the LINT_ONLY branch returns the same result shape as other
strategies.
- Around line 603-625: The code dispatches SCRIPT_EXEC/TEST_SUITE even when
parsed_files is empty, causing spurious "file not found" / "no tests collected"
errors; change the sandbox_validate logic to detect an empty parsed_files (from
state.get("parsed_files", [])) and skip sandbox validation for both
ValidationStrategy.SCRIPT_EXEC and ValidationStrategy.TEST_SUITE by returning or
setting result to None and appending a trace message like "sandbox_validate:
skipping sandbox run - no materialized files"; specifically guard before calling
_run_sandbox_script and _run_sandbox_tests so neither function is invoked when
parsed_files is falsy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ef363ee1-c4b5-47ca-b32a-76438b1154a9
📒 Files selected for processing (2)
dev-suite/src/orchestrator.pydev-suite/tests/test_sandbox_validate.py
…, .pyi exclusion CR fixes addressed: - e2b_runner.py: aggregate exit_code from __RESULTS__ trailer instead of wrapper process exit code (Critical — child cmd failures were silent) - e2b_runner.py: scale outer sandbox timeout to num_commands * timeout + 30 (Critical — sequential commands could be killed prematurely) - validation_commands.py: exclude .pyi stubs from SCRIPT_EXEC selection via EXECUTABLE_PYTHON_EXTENSIONS; .pyi-only files route to LINT_ONLY - test_validation_commands.py: update .pyi test for LINT_ONLY, rename misleading test, add script_file selection coverage, add .pyi+.py coexist test
…patch CR fixes addressed: - orchestrator.py: guard against empty parsed_files before dispatching SCRIPT_EXEC/TEST_SUITE/LINT_ONLY to sandbox (Major — spurious errors) - orchestrator.py: fold LINT_ONLY into TEST_SUITE branch so lint-only plans execute their commands instead of falling through to skip (Minor)
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
dev-suite/src/sandbox/e2b_runner.py (1)
374-380:⚠️ Potential issue | 🔴 CriticalParse the status trailer before
self.run()truncates stdout.Both paths recover the real exit status from markers appended at the end of child output, but
self.run()trimsoutput_summarybefore either parser runs. A noisyruff/pytestrun or chatty script can drop__RESULTS__/__EXIT_CODE__, leavingaggregate_exit_codeorscript_exitat0and reporting a false pass. Preserve the untruncated summary for internal parsing and only truncate after extracting these markers.Also applies to: 395-406, 461-467, 471-475
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/sandbox/e2b_runner.py` around lines 374 - 380, The current flow calls self.run(...) which returns an output_summary that is immediately truncated before parsing for embedded markers like "__RESULTS__" and "__EXIT_CODE__", causing loss of trailing markers; change the logic in functions calling run (the blocks around the result = self.run(...) calls and the later parsing that computes aggregate_exit_code and script_exit) to first keep the full untruncated output_summary (e.g., store it as raw_output or full_summary), parse that raw text to extract the status trailer and compute aggregate_exit_code and script_exit from the "__RESULTS__"/"__EXIT_CODE__" markers, and only after successful extraction perform the truncation used for UI/display; apply the same change to the other occurrences referenced (the blocks around lines handling aggregate_exit_code and script_exit at the other call sites).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dev-suite/src/sandbox/e2b_runner.py`:
- Around line 352-362: The per-command TimeoutExpired handling currently sets
rc=-1 and uses err="TIMEOUT" but doesn't expose a timed_out flag and the summary
drops the timeout marker; update the subprocess TimeoutExpired except block (the
subprocess.run call and results list entries) to append a result object that
includes a dedicated timed_out: True field (e.g. {"cmd": cmd, "rc": -1, "out":
"", "err": "TIMEOUT", "timed_out": True}) so timeouts are surfaced as explicit
booleans, and adjust any downstream summary/printing logic that reads results
(the r["err"]/r["rc"] checks and the overall sandbox timed_out calculation) to
respect per-command timed_out; apply the identical change to the other
occurrence mentioned (lines 417-419).
- Around line 400-405: The loop over cmd_results currently turns rc==127 into
only a warning (warnings.append) which masks missing runtimes/tools; change the
logic in the for cr in cmd_results loop (variables: rc, cmd_results, warnings,
aggregate_exit_code) so that a 127 exit code is treated as a non-zero failure:
either remove the special-case branch for rc == 127 so it falls through to the
existing elif rc != 0 branch (which sets aggregate_exit_code), or if you keep a
warning for 127 also ensure you set aggregate_exit_code = 127 when
aggregate_exit_code == 0; preserve the existing behavior that only the first
non-zero rc updates aggregate_exit_code.
In `@dev-suite/src/sandbox/validation_commands.py`:
- Around line 240-248: get_validation_plan can return
ValidationStrategy.LINT_ONLY for .pyi-only inputs but the orchestrator currently
only handles SCRIPT_EXEC and TEST_SUITE, so LINT_ONLY plans are never executed;
update the orchestrator's dispatch/run_validation_plan logic to handle
ValidationStrategy.LINT_ONLY by invoking the same lint execution path used for
other strategies (use the ValidationPlan.commands list and template from the
returned ValidationPlan) so lint-only validation runs for .pyi inputs; reference
get_validation_plan, ValidationPlan, and ValidationStrategy.LINT_ONLY when
locating the related code to implement this branch.
- Around line 200-209: The current has_frontend and has_python branch returns a
ValidationPlan with ValidationStrategy.TEST_SUITE and FULLSTACK_COMMANDS
unconditionally, which reintroduces running pytest even when there are no test
files; update the branch in the function that builds the ValidationPlan to call
the existing _has_test_files(target_files) (or equivalent) and only include the
pytest invocation in the commands if tests are present, otherwise either (a)
remove/replace the pytest entry from FULLSTACK_COMMANDS for this case or (b)
select a non-test suite command set (e.g., a lint/build-only plan); adjust the
logic around ValidationStrategy.TEST_SUITE and the commands list so python -m
pytest is not run when _has_test_files() returns False.
---
Duplicate comments:
In `@dev-suite/src/sandbox/e2b_runner.py`:
- Around line 374-380: The current flow calls self.run(...) which returns an
output_summary that is immediately truncated before parsing for embedded markers
like "__RESULTS__" and "__EXIT_CODE__", causing loss of trailing markers; change
the logic in functions calling run (the blocks around the result = self.run(...)
calls and the later parsing that computes aggregate_exit_code and script_exit)
to first keep the full untruncated output_summary (e.g., store it as raw_output
or full_summary), parse that raw text to extract the status trailer and compute
aggregate_exit_code and script_exit from the "__RESULTS__"/"__EXIT_CODE__"
markers, and only after successful extraction perform the truncation used for
UI/display; apply the same change to the other occurrences referenced (the
blocks around lines handling aggregate_exit_code and script_exit at the other
call sites).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 299e6ca6-696c-47f9-bac0-43c1dfa2d6f6
📒 Files selected for processing (3)
dev-suite/src/sandbox/e2b_runner.pydev-suite/src/sandbox/validation_commands.pydev-suite/tests/test_validation_commands.py
🚧 Files skipped from review as they are similar to previous changes (1)
- dev-suite/tests/test_validation_commands.py
| r = subprocess.run( | ||
| cmd, shell=True, capture_output=True, text=True, timeout={timeout}, | ||
| ) | ||
| results.append({{"cmd": cmd, "rc": r.returncode, "out": r.stdout, "err": r.stderr}}) | ||
| except subprocess.TimeoutExpired: | ||
| results.append({{"cmd": cmd, "rc": -1, "out": "", "err": "TIMEOUT"}}) | ||
|
|
||
| for r in results: | ||
| print(r["out"]) | ||
| if r["err"] and r["err"] != "TIMEOUT": | ||
| print(r["err"]) |
There was a problem hiding this comment.
Surface per-command timeouts as actual timeouts.
A child timeout is reduced to rc = -1, its "TIMEOUT" marker is dropped from the summary, and the returned model still copies timed_out only from the outer sandbox call. That makes command timeouts indistinguishable from generic failures for the orchestrator and trace logs.
Also applies to: 417-419
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dev-suite/src/sandbox/e2b_runner.py` around lines 352 - 362, The per-command
TimeoutExpired handling currently sets rc=-1 and uses err="TIMEOUT" but doesn't
expose a timed_out flag and the summary drops the timeout marker; update the
subprocess TimeoutExpired except block (the subprocess.run call and results list
entries) to append a result object that includes a dedicated timed_out: True
field (e.g. {"cmd": cmd, "rc": -1, "out": "", "err": "TIMEOUT", "timed_out":
True}) so timeouts are surfaced as explicit booleans, and adjust any downstream
summary/printing logic that reads results (the r["err"]/r["rc"] checks and the
overall sandbox timed_out calculation) to respect per-command timed_out; apply
the identical change to the other occurrence mentioned (lines 417-419).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dev-suite/src/orchestrator.py (1)
626-642: Redundant conditional:parsed_filesis guaranteed truthy here.The guard at lines 608-615 already returns early when
parsed_filesis empty for these strategies. Theparsed_files if parsed_files else Noneexpressions at lines 629 and 641 can be simplified to justparsed_files.♻️ Simplify the redundant conditionals
if plan.strategy == ValidationStrategy.SCRIPT_EXEC: trace.append(f"sandbox_validate: executing script {plan.script_file}") result = _run_sandbox_script( script_file=plan.script_file, template=plan.template, - parsed_files=parsed_files if parsed_files else None, + parsed_files=parsed_files, ) # CR fix: fold LINT_ONLY into TEST_SUITE -- both run commands # sequentially via _run_sandbox_tests(). LINT_ONLY was falling # through to the skip branch instead of executing plan.commands. elif plan.strategy in {ValidationStrategy.TEST_SUITE, ValidationStrategy.LINT_ONLY}: trace.append( f"sandbox_validate: running {len(plan.commands)} command(s) sequentially" ) result = _run_sandbox_tests( commands=plan.commands, template=plan.template, - parsed_files=parsed_files if parsed_files else None, + parsed_files=parsed_files, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dev-suite/src/orchestrator.py` around lines 626 - 642, The conditional ternaries passing parsed_files are redundant because parsed_files is guaranteed non-empty earlier; update the calls to _run_sandbox_script(...) and _run_sandbox_tests(...) to pass parsed_files directly (replace parsed_files if parsed_files else None with parsed_files) for the branches handling plan and ValidationStrategy.TEST_SUITE / ValidationStrategy.LINT_ONLY; ensure you update both call sites that reference plan.template, plan.script_file, and plan.commands while leaving all other logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@dev-suite/src/orchestrator.py`:
- Around line 626-642: The conditional ternaries passing parsed_files are
redundant because parsed_files is guaranteed non-empty earlier; update the calls
to _run_sandbox_script(...) and _run_sandbox_tests(...) to pass parsed_files
directly (replace parsed_files if parsed_files else None with parsed_files) for
the branches handling plan and ValidationStrategy.TEST_SUITE /
ValidationStrategy.LINT_ONLY; ensure you update both call sites that reference
plan.template, plan.script_file, and plan.commands while leaving all other logic
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a366c29e-7ece-4e01-98e0-ed4e1f8a26a4
📒 Files selected for processing (2)
dev-suite/src/orchestrator.pydev-suite/tests/test_sandbox_validate.py
… skips pytest CR fixes addressed: - e2b_runner.py: change `elif` to `if` so rc=127 (missing tool) both warns AND sets aggregate_exit_code (Critical — missing pnpm/tsc passed) - validation_commands.py: fullstack branch now checks _has_test_files() and uses PYTHON_LINT_COMMANDS when no tests present (Major — "no tests collected" failure for mixed frontend+Python without test files) - test_validation_commands.py: existing mixed tests now include test files, added test_mixed_frontend_and_python_no_tests for lint-only path
Summary
Fixes #95 — sandbox validation now dispatches on a
ValidationStrategyenum instead of blindly runningruff + pytestfor all Python files.Root Cause (from investigation)
The original issue was reported as an E2B timeout, but reproduction tests and Langfuse trace analysis revealed three structural bugs:
ruffnot installed oncode-interpreter-v1— every sandbox run failed withruff: not found(exit 127)&&chain masked failures — ruff failure prevented pytest from running, but the Python wrapper exited 0, soSandboxResult.exit_code=0hello_world.pywith no test suite gotruff + pytest tests/, which validated nothingThe timeout was not reproducible (sandbox creation + execution completes in <1s).
Changes
validation_commands.pyValidationStrategyenum:TEST_SUITE,SCRIPT_EXEC,LINT_ONLY,SKIPTEST_INDICATORSsetTEST_SUITE(ruff + pytest)SCRIPT_EXEC(run the script, check exit code)TEST_SUITE(pnpm check + tsc)SKIPwhich ruffcheck before running lint — skips with[WARN]if unavailablescript_filefield onValidationPlanforSCRIPT_EXECtargetse2b_runner.pyvalidation_skippedandwarningsfields onSandboxResultrun_tests()refactored: sequential command execution (eachsubprocess.run()independent). Missing ruff no longer blocks pytest. Parses__RESULTS__JSON for per-command exit codes.run_script(): new method forSCRIPT_EXEC— writes files, runspython <file>, reports 1 passed/1 failedorchestrator.py_run_sandbox_tests()replaces_run_sandbox_validation()— passescommandslist (not compound string)_run_sandbox_script()— new helper forSCRIPT_EXECdispatchsandbox_validate_nodedispatches onplan.strategy:SKIP→ returnsSandboxResult(validation_skipped=True)SCRIPT_EXEC→ calls_run_sandbox_script()TEST_SUITE→ calls_run_sandbox_tests()Tests
test_validation_commands.py: 24 tests covering strategy selection for all file type combinationstest_sandbox_validate.py: 15 tests covering strategy dispatch,validation_skippedflag, warnings in trace, graph wiringRelated
dev-suite/scripts/reproduce_timeout.pySummary by CodeRabbit
New Features
Bug Fixes
Tests