Skip to content

feat(dashboard): code-insights integration — pattern synthesis from eval sessions #787

@christso

Description

@christso

Objective

Extract and synthesize learnings from eval sessions — decisions with trade-offs, friction points, effective patterns — and surface them in the dashboard. Inspired by melagiri/code-insights which transforms AI coding sessions into actionable knowledge.

Architecture Boundary

external-first — dashboard analysis layer + optional plugin. Does not modify core eval engine.

What this enables

Eval runs generate rich data but insights are lost between sessions. This feature extracts durable knowledge:

  • Decision extraction: What trade-offs did the agent make? What alternatives existed?
  • Friction points: Which test categories consistently cause problems? Which tool calls fail most?
  • Effective patterns: What prompt structures, tool sequences, or strategies correlate with high scores?
  • Pattern export: Convert insights into quality gate rules or CLAUDE.md recommendations

Proposed capabilities

Insight Extraction (per-run)

  • Analyze eval traces to extract: decisions made, tools used, error recovery patterns
  • Score each insight by impact (how much did it affect the final score?)
  • Tag insights by category (retrieval, reasoning, tool use, formatting)

Pattern Synthesis (cross-run)

  • Aggregate insights across runs within a campaign
  • Identify: recurring friction points, consistently effective strategies, degrading patterns
  • Synthesis window: configurable (last 5 runs, last 7 days, full campaign)

Dashboard Views

  • Insights feed: Recent insights ordered by impact
  • Pattern trends: Which patterns are becoming more/less effective over time
  • Friction heatmap: Test categories × failure types, showing persistent problem areas
  • Recommendations: Auto-generated suggestions based on pattern analysis

Export

Design Latitude

  • Whether insight extraction uses LLM analysis or heuristic pattern matching
  • Synthesis algorithm (frequency-based, score-correlation, or LLM-summarized)
  • Storage format for extracted insights
  • How deep the code-insights integration goes (import their data vs. re-implement their approach)

Acceptance Signals

  • Per-run insights extracted from eval traces
  • Cross-run pattern synthesis identifies friction points and effective patterns
  • Dashboard displays insights feed with impact scores
  • Pattern trends visible over time
  • At least one export format (quality gate rules or CLAUDE.md recommendations)

Non-Goals

  • Real-time session monitoring (code-insights' primary use case)
  • AI fluency scoring (code-insights feature, doesn't map to eval framework)
  • Multi-tool analysis (focus on agentv eval data, not external tool sessions)
  • Replacing code-insights (complementary, not competitive)

Dependencies

Research source

  • melagiri/code-insights — session analysis, pattern detection, weekly synthesis, rule generation, AI fluency scoring
  • code-insights architecture: Vite + React SPA, Hono API server, SQLite local-first storage, Ollama for free LLM analysis

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestwuiRelates to the browser dashboard / web UI runtime

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions