-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or requestwuiRelates to the browser dashboard / web UI runtimeRelates to the browser dashboard / web UI runtime
Description
Objective
Add a dashboard surface for monitoring and controlling active eval orchestration workflows (autoresearch loops, Ralph Loops, mutator campaigns). Users should be able to watch, pause, resume, and adjust active campaigns from the browser without touching the CLI.
Architecture Boundary
external-first — this is a dashboard UI layer on top of existing CLI orchestration engines (#748 autoresearch, #699 Ralph Loop, #746 mutator subagent). Does not modify the engines themselves.
What this enables
Currently, #748/#699/#746 are CLI-only. Users start a loop and either watch terminal output or check back later. The orchestration monitor turns the dashboard into a command center:
- Watch active campaigns in real-time (score trajectory, iteration count, cost burn)
- Control campaigns (pause/resume/stop, adjust thresholds mid-run)
- Review campaign history (iteration progression, decisions made, artifacts produced)
- Compare campaign iterations side-by-side
Proposed views
Campaign List
- Active campaigns with live status indicators
- Historical campaigns with final results
- Filters: status (active/completed/stopped), target, date range
Campaign Detail
- Score trajectory chart: score over iterations with annotations (mutations, threshold changes)
- Iteration table: each iteration with score, cost, duration, mutation applied, verdict (kept/dropped)
- Termination criteria: progress bar toward target (e.g., "85% pass rate: currently at 81%")
- Cost burn rate: projected total cost based on current trajectory
- Controls: pause, resume, stop, adjust target threshold, adjust max iterations, adjust cost budget
Campaign Comparison
- Select 2+ campaigns → compare final scores, cost efficiency, iteration count
- Identify which mutation strategies were most effective
Design Latitude
- How campaigns communicate state to dashboard (filesystem polling, IPC, WebSocket)
- Whether controls are real-time (WebSocket to running process) or deferred (write config file, process picks up on next iteration)
- Campaign metadata schema
- How to handle campaigns that started before dashboard was running
Acceptance Signals
- Dashboard shows active campaigns with live score updates
- Users can pause/resume/stop a campaign from the dashboard
- Score trajectory chart updates as iterations complete
- Cost burn rate and termination progress are visible
- Campaign history is browsable with iteration detail
- Campaign comparison works for 2+ campaigns
Non-Goals
- Visual DAG editor for building eval chains (use YAML + CLI)
- Replacing CLI as the primary way to start campaigns
- Multi-user concurrent campaign management
Dependencies
- feat(bench): autoresearch mode — unattended eval-improve loop with hill-climbing ratchet #748 (autoresearch mode) — the engine this monitors
- feat(eval): Ralph Loop — iterative improvement with feedback injection #699 (Ralph Loop) — feedback injection engine
- feat(bench): mutator subagent — autonomous artifact rewriting from failure analysis #746 (mutator subagent) — artifact rewriting engine
- feat: AgentV Studio — eval management platform with historical trends, quality gates, and orchestration #563 (dashboard platform) — the UI platform this lives in
Research source
- melagiri/code-insights — real-time session analysis with activity charts and progressive updates
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestwuiRelates to the browser dashboard / web UI runtimeRelates to the browser dashboard / web UI runtime
Type
Projects
Status
Backlog