feat(eval): add path-derived category field for hierarchical grouping#816
Merged
feat(eval): add path-derived category field for hierarchical grouping#816
Conversation
Placeholder commit to open draft PR. See PR description for implementation plan. Closes #813
Deploying agentv with
|
| Latest commit: |
f245b56
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://4f57f656.agentv.pages.dev |
| Branch Preview URL: | https://feat-category-field.agentv.pages.dev |
Add `readonly category?: string` to both interfaces to support path-derived categorization of eval tests throughout the pipeline.
Move category derivation logic from CLI discover.ts into a shared core module so it can be reused by the YAML parser and run-eval.
Add category to LoadOptions and pass it through to constructed EvalTest objects so tests carry their file-derived category.
Compute category from the eval file's relative path and forward it to loadTestSuite so each test gets its category assigned.
Include evalCase.category in all result-building paths (success, budget-exceeded, fail-on-error, and error results).
Add category field to IndexArtifactEntry and ResultManifestRecord so it flows through JSONL output and manifest hydration.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
categoryfield to eval pipeline for hierarchical organization: Category > Dataset > Test IDCloses #813
Implementation Plan
Task 1: Add
categoryto core typesreadonly category?: stringtoEvalTestandEvaluationResultinterfacesTask 2: Extract
deriveCategoryto@agentv/corediscover.tsto sharedpackages/core/src/evaluation/category.ts"root"to"Uncategorized"DEFAULT_CATEGORYconstantTask 3: Propagate category through YAML parser
categorytoLoadOptions, assign to each parsed test caseTask 4: Pass category from CLI run-eval to parser
prepareFileMetadata, pass toloadTestSuiteTask 5: Pass category through orchestrator
category: evalCase.categoryinbuildEvaluationResultCommon()and all error pathsTask 6: Include category in artifact writer and manifest
IndexArtifactEntry,buildIndexArtifactEntry,ResultManifestRecord,hydrateManifestRecordTask 7: Add categories API endpoints
GET /api/runs/:filename/categories— list categories with statsGET /api/runs/:filename/categories/:category/datasets— datasets within a categoryTask 8: Add Studio types and API hooks
CategorySummary,CategoriesResponsetypesuseRunCategories,useCategoryDatasetshooksTask 9: Update RunDetail with category grouping
Task 10: Add category route in Studio
/runs/$runId/category/$categoryshowing datasets in that categoryTask 11: Update Breadcrumbs and Sidebar
CategorySidebarshowing datasets within a categoryTask 12: Full verification
deriveCategory(8 tests)Test plan
Uncategorizedcategoryrender underUncategorized🤖 Generated with Claude Code