Skip to content

feat(agentv-bench): add executor subagent for non-CLI targets#808

Merged
christso merged 15 commits intomainfrom
feat/797-executor-subagent-skill
Mar 28, 2026
Merged

feat(agentv-bench): add executor subagent for non-CLI targets#808
christso merged 15 commits intomainfrom
feat/797-executor-subagent-skill

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

  • Create agents/executor.md — executor subagent that performs eval test cases (mirrors grader pattern)
  • Update SKILL.md: executor dispatch flow, subagent_mode_allowed docs, .agentv/targets.yaml location, ${{ ENV_VAR }} security note
  • Update eval-yaml-spec.md: manifest-based opt-out, two invoke.json kinds (cli/agent)
  • Update run_tests.py docstring

Skill-side companion to #804 and #807.

Test plan

  • All pre-push checks pass (build, typecheck, lint, tests, validate examples)
  • Executor subagent provides clear step-by-step instructions with workspace isolation

🤖 Generated with Claude Code

Add executor subagent that performs eval test cases directly when the
target is a non-CLI provider. Mirrors the grader pattern — one executor
per test case, all dispatched in parallel.

- Create agents/executor.md with workspace isolation guardrails
- Update SKILL.md: executor dispatch flow, subagent_mode_allowed docs,
  targets.yaml location and ${{ ENV_VAR }} security note
- Update eval-yaml-spec.md: manifest-based opt-out, two invoke.json kinds
- Update run_tests.py docstring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 28, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9c7726c
Status: ✅  Deploy successful!
Preview URL: https://9adc4eaf.agentv.pages.dev
Branch Preview URL: https://feat-797-executor-subagent-s.agentv.pages.dev

View logs

christso and others added 14 commits March 28, 2026 08:30
Executor should have access to all tools, not a hardcoded subset.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename input.json fields to match the eval YAML schema and code grader
SDK naming conventions:
- input_text + input_messages → input (Message[])
- file_paths → input_files (string[])

Drop redundant input_text — derive it from input[0].content where needed.
Remove unnecessary user_notes step from executor subagent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er payload

The code grader SDK uses input_text and expected_output_text (via
toSnakeCaseDeep). The question and reference_answer fields were
redundant legacy names not present in the real orchestrator payload.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The executor works in the current workspace naturally. No need to
restrict or parameterize the working directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…agents

Clarify that the current workspace is the target workspace (multi-repo),
not the eval repo. Warn if the user opened the wrong workspace since
executor subagents won't have access to the agent's skills and repos.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rkspace-dependent evals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single message: returns content directly (no role prefix).
Multiple messages: prefixes each with @ROLE for least surprise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso merged commit 6305275 into main Mar 28, 2026
2 checks passed
@christso christso deleted the feat/797-executor-subagent-skill branch March 28, 2026 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant