Skip to content

tracking: AgentV Studio — eval management platform with quality gates, orchestration, and analysis #788

@christso

Description

@christso

Summary

Upgrade the agentv dashboard from a read-only viewer into a full AgentV Studio — a management and analysis platform for quality gate enforcement, orchestration monitoring, regression detection, cost attribution, root cause diagnosis, and pattern synthesis.

Inspired by melagiri/code-insights (React + Vite + Hono dashboard with real-time session analysis, cost tracking, pattern detection).

Architecture Boundary Summary

Layer Scope Issues
Platform foundation React+Vite SPA, Hono server, history API, run management #563
Quality enforcement Dashboard gate config, regression alerts, cost views #334, #335, #635
Orchestration control Campaign monitoring, pause/resume/stop active loops #785
Deep analysis Trace-driven root cause diagnosis, pattern synthesis #786, #787

Current Implementation Status

Component Status
html-writer.ts (static HTML report) Implemented
History repo architecture Not started
React+Vite dashboard scaffold Not started
Hono API server Not started
SSE progressive visualization Not started
Quality gate engine (severity, remediation) Not started
Quality gate dashboard UI Not started
Regression detection engine Not started
Regression alert visualization Not started
Cost computation engine Not started
Cost attribution dashboard views Not started
Orchestration monitor UI Not started
Root cause explorer Not started
Code-insights pattern synthesis Not started

Dependency Graph

Phase 1: Platform Foundation
  #563 (AgentV Studio platform)
    - React+Vite scaffold
    - Hono API server
    - History repo integration
    - Run management views

Phase 2: Quality Enforcement (parallel tracks)
  #334 (quality gates)
  #335 (regression alerts)       -- all depend on #563 platform
  #635 (cost attribution)

Phase 3: Orchestration Control
  #785 (orchestration monitor)
    - depends on #563 platform
    - depends on #748, #699, #746 engines

Phase 4: Deep Analysis (parallel tracks)
  #786 (root cause explorer)     -- depend on #563 platform + #335 regression data
  #787 (code-insights)

Parallel Execution Waves

Wave 1 — Platform (sequential prerequisite)

Wave 2 — Quality Enforcement (parallel after Wave 1)

Wave 3 — Orchestration (after Wave 1, can overlap Wave 2)

Wave 4 — Deep Analysis (after Waves 1-2)

Merge Order (low-conflict default)

  1. feat: AgentV Studio — eval management platform with historical trends, quality gates, and orchestration #563 (platform) — foundational, merge first
  2. feat: compute costUsd from token usage via model pricing table #635 (cost) — smallest scope, least conflict
  3. feat(eval): composable quality gates with auto-remediation triggers #334 (quality gates) — independent of feat(eval): iteration tracking, termination taxonomy, and cross-run regression detection #335
  4. feat(eval): iteration tracking, termination taxonomy, and cross-run regression detection #335 (regression) — independent of feat(eval): composable quality gates with auto-remediation triggers #334
  5. feat(dashboard): orchestration monitor — campaign management UI #785 (orchestration) — depends on platform only
  6. feat(dashboard): root cause explorer — trace-driven failure diagnosis #786 (root cause) — depends on feat(eval): iteration tracking, termination taxonomy, and cross-run regression detection #335 regression data
  7. feat(dashboard): code-insights integration — pattern synthesis from eval sessions #787 (code-insights) — depends on platform + feat(eval): composable quality gates with auto-remediation triggers #334 for rule export

Subagent Operating Contract

Completion Criteria

  • All 7 sub-issues closed
  • Dashboard serves from agentv serve with React SPA
  • Quality gates configurable from dashboard UI
  • Regression alerts visible in real-time
  • Active campaigns monitorable and controllable
  • Cost attribution visible per evaluator/category/target
  • Root cause explorer links regressions to trace-level diagnosis
  • Pattern synthesis extracts insights across runs
  • Static --format html report continues to work independently

Research Source

  • melagiri/code-insights — React+Vite+Hono dashboard, real-time session analysis, cost tracking, pattern detection, AI fluency scoring, rule generation
  • DeepEval Confident AI — cloud eval dashboard with trends
  • Convex Evals — React dashboard with category breakdown
  • Anthropic skill-creator — eval-viewer with grading.json schema alignment

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicTracking issue for a multi-issue initiativewuiRelates to the browser dashboard / web UI runtime

    Type

    No type

    Projects

    Status

    Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions