feat(eval): add category field to eval YAML for hierarchical grouping

## Summary

Add an optional `category` field to the eval YAML schema to enable hierarchical organization: **Category > Dataset > Test ID**. This introduces a higher-level grouping above datasets (eval files) for projects with many eval files.

## Motivation

Real-world projects accumulate dozens of eval YAML files. Without a grouping mechanism above the dataset level, the Studio run detail page becomes a flat list. A `category` field provides lightweight, optional organization similar to how convex-evals uses directory-based categories (`000-fundamentals`, `001-data_modeling`).

### Proposed hierarchy

| Level | Source | Example |
|---|---|---|
| **Category** | `category` field in eval YAML (defaults to `"default"`) | `"Fundamentals"`, `"Advanced"`, `"Regression"` |
| **Dataset** | `name` field or filename of eval YAML (was `eval_set`) | `"greeting-tests"`, `"math-benchmark"` |
| **Test ID** | `id` field on individual test | `"test-greeting"`, `"test-addition"` |

### Example YAML

```yaml
name: greeting-tests
category: Fundamentals
description: Basic greeting and politeness tests

tests:
  - id: test-greeting
    criteria: Agent should greet the user
    # ...
```

If `category` is omitted, it defaults to `"default"`.

## Objective

Add `category` as a suite-level field in eval YAML, propagate it through the pipeline to results, and add a two-level drill-down in Studio (Category > Dataset > Eval).

## Design latitude

- **Default category name**: Implementer can choose `"default"`, `"Uncategorized"`, or `"General"` — pick whichever reads best in the UI.
- **Studio UI grouping**: The run detail page can use collapsible sections, a two-column layout, or a separate route per category. Collapsible sections are simplest.
- **API shape**: The implementer can either nest datasets inside the categories response or keep them as separate endpoints. Separate endpoints are preferred for consistency.

## Key files to change

### Schema & types (packages/core/src/)
- `evaluation/validation/eval-file.schema.ts` — add `category: z.string().optional()` at suite level
- `evaluation/types.ts` — add `category?: string` to `EvalTest` and `EvaluationResult`
- `evaluation/yaml-parser.ts` — read `category` from suite object, default to `"default"`, assign to each test case (near line ~268 where `evalSetName` is extracted)
- `evaluation/orchestrator.ts` — pass `category` through to results

### Artifact pipeline (apps/cli/src/commands/)
- `eval/artifact-writer.ts` — include `category` in index manifest entries
- `results/manifest.ts` — add `category` to `ResultManifestRecord`, hydrate into `EvaluationResult`
- `results/serve.ts` — new endpoint: `GET /api/runs/:filename/categories/:category/datasets`; modify existing categories endpoint to group by the new `category` field instead of `dataset`

### Studio (apps/studio/src/) — assumes #812 is merged first
- `lib/types.ts` — add `category` to `EvalResult`; new `CategoryWithDatasets` response type
- `lib/api.ts` — new `useCategoryDatasets(runId, category)` hook
- `components/RunDetail.tsx` — group by category first, then show dataset cards within each category section
- `routes/runs/$runId_.category.$category.tsx` — show datasets in that category (not individual evals)
- New route: `routes/runs/$runId_.category.$category.dataset.$dataset.tsx` — show evals in that dataset
- `components/Breadcrumbs.tsx` — add dataset segment to breadcrumb trail
- `components/Sidebar.tsx` — update drill-down: category sidebar shows datasets, dataset sidebar shows evals

## Acceptance signals

- [ ] `category: Fundamentals` in eval YAML appears in JSONL output and Studio
- [ ] Eval YAML without `category` field defaults to `"default"` category
- [ ] Studio run detail groups datasets under category headers
- [ ] Drill-down: click category → see datasets → click dataset → see evals
- [ ] Breadcrumbs show full path: Home > Run > Category > Dataset > Eval
- [ ] All existing tests pass (no regressions)
- [ ] Old JSONL files without `category` field render under default category

## Non-goals

- Nested categories (single level only)
- Auto-inferring category from directory structure
- Changing experiment or target semantics

## Related

- #812 — Rename `eval_set` to `dataset` (prerequisite — must be merged first)
- #810 — Studio feature parity (current implementation uses eval_set as the sole grouping level)

Level	Source	Example
Category	`category` field in eval YAML (defaults to `"default"`)	`"Fundamentals"`, `"Advanced"`, `"Regression"`
Dataset	`name` field or filename of eval YAML (was `eval_set`)	`"greeting-tests"`, `"math-benchmark"`
Test ID	`id` field on individual test	`"test-greeting"`, `"test-addition"`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): add category field to eval YAML for hierarchical grouping #813

Summary

Motivation

Proposed hierarchy

Example YAML

Objective

Design latitude

Key files to change

Schema & types (packages/core/src/)

Artifact pipeline (apps/cli/src/commands/)

Studio (apps/studio/src/) — assumes #812 is merged first

Acceptance signals

Non-goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(eval): add category field to eval YAML for hierarchical grouping #813

Description

Summary

Motivation

Proposed hierarchy

Example YAML

Objective

Design latitude

Key files to change

Schema & types (packages/core/src/)

Artifact pipeline (apps/cli/src/commands/)

Studio (apps/studio/src/) — assumes #812 is merged first

Acceptance signals

Non-goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions