Skip to content

feat(eval): add path-derived category field for hierarchical grouping#816

Merged
christso merged 16 commits intomainfrom
feat/category-field
Mar 28, 2026
Merged

feat(eval): add path-derived category field for hierarchical grouping#816
christso merged 16 commits intomainfrom
feat/category-field

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Mar 28, 2026

Summary

  • Add path-derived category field to eval pipeline for hierarchical organization: Category > Dataset > Test ID
  • Categories are derived from eval file directory structure (no new YAML field)
  • Propagate through: EvalTest → EvaluationResult → JSONL/manifest → Studio API → Studio UI
  • Studio gets two-level drill-down with collapsible category sections, new routes, and updated breadcrumbs

Closes #813

Implementation Plan

Task 1: Add category to core types

  • Add readonly category?: string to EvalTest and EvaluationResult interfaces

Task 2: Extract deriveCategory to @agentv/core

  • Move from discover.ts to shared packages/core/src/evaluation/category.ts
  • Change fallback from "root" to "Uncategorized"
  • Export DEFAULT_CATEGORY constant

Task 3: Propagate category through YAML parser

  • Add category to LoadOptions, assign to each parsed test case

Task 4: Pass category from CLI run-eval to parser

  • Derive category from file path in prepareFileMetadata, pass to loadTestSuite

Task 5: Pass category through orchestrator

  • Add category: evalCase.category in buildEvaluationResultCommon() and all error paths

Task 6: Include category in artifact writer and manifest

  • Add to IndexArtifactEntry, buildIndexArtifactEntry, ResultManifestRecord, hydrateManifestRecord

Task 7: Add categories API endpoints

  • GET /api/runs/:filename/categories — list categories with stats
  • GET /api/runs/:filename/categories/:category/datasets — datasets within a category

Task 8: Add Studio types and API hooks

  • CategorySummary, CategoriesResponse types
  • useRunCategories, useCategoryDatasets hooks

Task 9: Update RunDetail with category grouping

  • Group by category first (collapsible sections), then datasets within

Task 10: Add category route in Studio

  • New route: /runs/$runId/category/$category showing datasets in that category

Task 11: Update Breadcrumbs and Sidebar

  • Category segment in breadcrumbs
  • CategorySidebar showing datasets within a category

Task 12: Full verification

  • All tests pass (1625+)
  • Build clean
  • Lint clean
  • Unit tests for deriveCategory (8 tests)

Test plan

  • Eval files in subdirectories get correct category in JSONL output
  • Eval files at root level default to Uncategorized
  • Studio run detail groups datasets under category headers
  • Click category → see datasets → click dataset → see evals
  • Breadcrumbs show: Home > Run > Category > Dataset > Eval
  • Old JSONL files without category render under Uncategorized
  • All existing tests pass

🤖 Generated with Claude Code

Placeholder commit to open draft PR. See PR description for implementation plan.

Closes #813
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages bot commented Mar 28, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: f245b56
Status: ✅  Deploy successful!
Preview URL: https://4f57f656.agentv.pages.dev
Branch Preview URL: https://feat-category-field.agentv.pages.dev

View logs

christso added 13 commits March 28, 2026 13:46
Add `readonly category?: string` to both interfaces to support
path-derived categorization of eval tests throughout the pipeline.
Move category derivation logic from CLI discover.ts into a shared
core module so it can be reused by the YAML parser and run-eval.
Add category to LoadOptions and pass it through to constructed
EvalTest objects so tests carry their file-derived category.
Compute category from the eval file's relative path and forward
it to loadTestSuite so each test gets its category assigned.
Include evalCase.category in all result-building paths (success,
budget-exceeded, fail-on-error, and error results).
Add category field to IndexArtifactEntry and ResultManifestRecord
so it flows through JSONL output and manifest hydration.
@christso christso marked this pull request as ready for review March 28, 2026 14:08
@christso christso merged commit b2f5471 into main Mar 28, 2026
2 checks passed
@christso christso deleted the feat/category-field branch March 28, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(eval): add category field to eval YAML for hierarchical grouping

1 participant