feat(agentv-bench): add executor subagent for non-CLI targets by christso · Pull Request #808 · EntityProcess/agentv

christso · 2026-03-28T08:26:45Z

Summary

Create agents/executor.md — executor subagent that performs eval test cases (mirrors grader pattern)
Update SKILL.md: executor dispatch flow, subagent_mode_allowed docs, .agentv/targets.yaml location, ${{ ENV_VAR }} security note
Update eval-yaml-spec.md: manifest-based opt-out, two invoke.json kinds (cli/agent)
Update run_tests.py docstring

Skill-side companion to #804 and #807.

Test plan

All pre-push checks pass (build, typecheck, lint, tests, validate examples)
Executor subagent provides clear step-by-step instructions with workspace isolation

🤖 Generated with Claude Code

Add executor subagent that performs eval test cases directly when the target is a non-CLI provider. Mirrors the grader pattern — one executor per test case, all dispatched in parallel. - Create agents/executor.md with workspace isolation guardrails - Update SKILL.md: executor dispatch flow, subagent_mode_allowed docs, targets.yaml location and ${{ ENV_VAR }} security note - Update eval-yaml-spec.md: manifest-based opt-out, two invoke.json kinds - Update run_tests.py docstring Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-03-28T08:27:31Z

Deploying agentv with Cloudflare Pages

Latest commit:	`9c7726c`
Status:	✅ Deploy successful!
Preview URL:	https://9adc4eaf.agentv.pages.dev
Branch Preview URL:	https://feat-797-executor-subagent-s.agentv.pages.dev

View logs

Executor should have access to all tools, not a hardcoded subset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Rename input.json fields to match the eval YAML schema and code grader SDK naming conventions: - input_text + input_messages → input (Message[]) - file_paths → input_files (string[]) Drop redundant input_text — derive it from input[0].content where needed. Remove unnecessary user_notes step from executor subagent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…er payload The code grader SDK uses input_text and expected_output_text (via toSnakeCaseDeep). The question and reference_answer fields were redundant legacy names not present in the real orchestrator payload. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The executor works in the current workspace naturally. No need to restrict or parameterize the working directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…agents Clarify that the current workspace is the target workspace (multi-repo), not the eval repo. Warn if the user opened the wrong workspace since executor subagents won't have access to the agent's skills and repos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rkspace-dependent evals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Single message: returns content directly (no role prefix). Multiple messages: prefixes each with @ROLE for least surprise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

christso and others added 14 commits March 28, 2026 08:30

fix(agentv-bench): remove tools restriction from executor subagent

8e928df

Executor should have access to all tools, not a hardcoded subset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(agentv-bench): add metadata field to executor input.json docs

a5e2085

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(pipeline): rename questionText to inputText for consistency

2661ee1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(agentv-bench): update stale input_text/input_messages references

e3db11a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(agentv-bench): simplify executor — remove workspace-dir parameter

72e7cac

The executor works in the current workspace naturally. No need to restrict or parameterize the working directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(agentv-bench): clarify workspace detection — only matters for wo…

771b142

…rkspace-dependent evals Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(pipeline): extractInputText should use all messages, not just first

e54efb6

Single message: returns content directly (no role prefix). Multiple messages: prefixes each with @ROLE for least surprise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(eval): fix inputText jsdoc — includes all messages, not just first

bcb87fd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(eval): fix expectedOutputText jsdoc — last message content, not all

ae8cbec

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(pipeline): use @[role] pattern for multi-message inputText

cf4690b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(pipeline): align @[role]: format with artifact-writer convention

9c7726c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

christso merged commit 6305275 into main Mar 28, 2026
2 checks passed

christso deleted the feat/797-executor-subagent-skill branch March 28, 2026 11:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agentv-bench): add executor subagent for non-CLI targets#808

feat(agentv-bench): add executor subagent for non-CLI targets#808
christso merged 15 commits intomainfrom
feat/797-executor-subagent-skill

christso commented Mar 28, 2026

Uh oh!

cloudflare-workers-and-pages bot commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Mar 28, 2026

Summary

Test plan

Uh oh!

cloudflare-workers-and-pages bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages bot commented Mar 28, 2026 •

edited

Loading