stacknil · stacknil · Mar 28, 2026 · Mar 28, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -1,28 +1,24 @@
-# AGENTS.md
-
-## Working rules
-
-- Inspect existing files before editing.
-- Make minimal coherent changes.
-- Prioritize an end-to-end runnable MVP over polish.
-- Do not present the repo as production-ready.
-- Run tests after code changes.
-
-## Project focus
-
-- Timestamped event streams
-- Sliding-window aggregation
-- Telemetry features
-- Simple rule-based alerts
-- Reproducible outputs from sample data
-
-## Review guidelines
-
-- Treat README and documentation mismatches against actual CLI/runtime behavior as high-priority findings.
-- Check all input-format claims against the real loader implementation.
-- Treat missing edge-case tests as important review findings when behavior depends on time parsing, window boundaries, or alert thresholds.
-- Prefer correcting documentation to match real behavior unless the code path is accidental or deprecated.
-- Flag alerting logic that is obviously too noisy for the bundled sample dataset.
-- Prefer small, scoped fixes over broad refactors during PR review.
-- Do not request production-grade features in a portfolio prototype unless the PR explicitly aims to add them.
-- When reviewing plots, outputs, and examples, verify that referenced files and commands actually exist.
+# AGENTS.md
+
+## Working rules
+
+- Inspect existing files before editing.
+- Make minimal coherent changes.
+- Prefer small, reviewable pull requests.
+- Prioritize correctness, reproducibility, and README accuracy over polish.
+- Do not present the repo as production-ready.
+
+## Build and test
+
+- Install: `python -m pip install -e .`
+- Test: `pytest`
+- Demo run: `python -m telemetry_window_demo.cli run --config configs/default.yaml`
+
+## Review guidelines
+
+- Treat README or docs mismatches against actual CLI/runtime behavior as important findings.
+- Check input-format claims against the real loader implementation.
+- Treat missing edge-case tests as important findings when behavior depends on time parsing, window boundaries, or alert thresholds.
+- Flag alerting logic that is obviously too noisy for the bundled sample dataset.
+- Prefer small, scoped fixes over broad refactors during review.
+- Verify that referenced commands, files, and output artifacts actually exist.
diff --git a/README.md b/README.md
@@ -1,107 +1,117 @@
-# telemetry-lab
-
-[![CI](https://github.com/stacknil/telemetry-lab/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/telemetry-lab/actions/workflows/ci.yml)
-
+# telemetry-lab
+
+[![CI](https://github.com/stacknil/telemetry-lab/actions/workflows/ci.yml/badge.svg)](https://github.com/stacknil/telemetry-lab/actions/workflows/ci.yml)
+
 Small portfolio prototypes for telemetry analytics, monitoring, and detection-oriented signal processing.
 
-## What This Repo Is
-
-`telemetry-window-demo` is a local Python CLI that turns timestamped event streams into:
-
-- sliding-window feature tables
-- cooldown-reduced rule-based alerts
-- PNG timeline plots
-- machine-readable run summaries
-
-## Quick Run
-
-```bash
-python -m pip install -e .
-python -m telemetry_window_demo.cli run --config configs/default.yaml
-```
-
-That command reads `data/raw/sample_events.jsonl` and regenerates:
-
-- `data/processed/features.csv`
-- `data/processed/alerts.csv`
-- `data/processed/summary.json`
-- `data/processed/event_count_timeline.png`
-- `data/processed/error_rate_timeline.png`
-- `data/processed/alerts_timeline.png`
-
-With the bundled default sample, the current repo state produces:
-
-- `41` normalized events
-- `24` windows
-- `12` alerts after a `60` second cooldown
-
-Why it is worth a quick look:
-
-- it shows a full telemetry path from raw events to operator-facing outputs
-- the sample inputs and outputs are reproducible in-repo
-- a second bundled scenario gives a slightly richer walkthrough without changing the basic CLI flow
-
-![Default alert timeline](data/processed/alerts_timeline.png)
-
-## Demo Variants
-
-Default sample:
+## Demos
 
-- config: [`configs/default.yaml`](configs/default.yaml)
-- input: `data/raw/sample_events.jsonl`
-- outputs: `data/processed/`
-- current summary: `41` events, `24` windows, `12` alerts, `summary.json` included
+- [telemetry-window-demo](#telemetry-window-demo)
+- [ai-assisted-detection-demo](demos/ai-assisted-detection-demo/README.md)
 
-Richer sample:
+| Demo | Input | Deterministic core | LLM role | Main artifacts | Guardrails / non-goals |
+| --- | --- | --- | --- | --- | --- |
+| [telemetry-window-demo](#telemetry-window-demo) | JSONL / CSV events | Windows<br>Features<br>Alert thresholds | None | `features.csv`<br>`alerts.csv`<br>`summary.json`<br>3 PNG plots | MVP only<br>No realtime<br>No case management |
+| [ai-assisted-detection-demo](demos/ai-assisted-detection-demo/README.md) | JSONL auth / web / process | Normalize<br>Rules<br>Grouping<br>ATT&CK mapping | JSON-only case drafting | `rule_hits.json`<br>`case_bundles.json`<br>`case_summaries.json`<br>`case_report.md`<br>`audit_traces.jsonl` | Human verification required<br>No autonomous response<br>No final verdict |
 
-- config: [`configs/richer_sample.yaml`](configs/richer_sample.yaml)
-- input: `data/raw/richer_sample_events.jsonl`
-- outputs: `data/processed/richer_sample/`
-- current summary: `28` events, `24` windows, `8` alerts, `summary.json` included
-
-## Input Support
-
-Runtime input support:
-
-- `.jsonl`
-- `.csv`
-
-Required fields for both formats on every row or record:
-
-- `timestamp`
-- `event_type`
-- `source`
-- `target`
-- `status`
-
-Cooldown behavior:
-
-- repeated alerts are keyed by `(rule_name, scope)`
-- scope prefers the first available entity-like field in this order: `entity`, `source`, `target`, `host`
-- when no entity-like field is present, cooldown falls back to per-`rule_name` behavior
-
-## Repo Guide
-
-- [`docs/sample-output.md`](docs/sample-output.md) summarizes the committed sample artifacts
-- [`docs/roadmap.md`](docs/roadmap.md) sketches the next demo directions
-- [`data/processed/summary.json`](data/processed/summary.json) captures the default run in machine-readable form
-- [`data/processed/richer_sample/summary.json`](data/processed/richer_sample/summary.json) captures the richer scenario pack
-- [`tests/`](tests/) keeps regression coverage close to the CLI behavior and windowing logic
-
-## Next Demo Directions
-
-- strengthen JSONL and CSV validation so ingestion failures are clearer
-- keep reducing repeated alert noise while preserving simple rule-based behavior
-- keep sample-output docs and public repo presentation aligned with the checked-in demo state
-
-## Scope
-
-This repository is a portfolio prototype, not a production monitoring system.
-
-## Limitations
-
-- No real-time ingestion
-- No streaming state management
-- No alert routing or case management
-- No dashboard or service deployment
-- Sample-data driven only
+## What This Repo Is
+
+`telemetry-window-demo` is a local Python CLI that turns timestamped event streams into:
+
+- sliding-window feature tables
+- cooldown-reduced rule-based alerts
+- PNG timeline plots
+- machine-readable run summaries
+
+## Quick Run
+
+```bash
+python -m pip install -e .
+python -m telemetry_window_demo.cli run --config configs/default.yaml
+```
+
+That command reads `data/raw/sample_events.jsonl` and regenerates:
+
+- `data/processed/features.csv`
+- `data/processed/alerts.csv`
+- `data/processed/summary.json`
+- `data/processed/event_count_timeline.png`
+- `data/processed/error_rate_timeline.png`
+- `data/processed/alerts_timeline.png`
+
+With the bundled default sample, the current repo state produces:
+
+- `41` normalized events
+- `24` windows
+- `12` alerts after a `60` second cooldown
+
+Why it is worth a quick look:
+
+- it shows a full telemetry path from raw events to operator-facing outputs
+- the sample inputs and outputs are reproducible in-repo
+- a second bundled scenario gives a slightly richer walkthrough without changing the basic CLI flow
+
+![Default alert timeline](data/processed/alerts_timeline.png)
+
+## Demo Variants
+
+Default sample:
+
+- config: [`configs/default.yaml`](configs/default.yaml)
+- input: `data/raw/sample_events.jsonl`
+- outputs: `data/processed/`
+- current summary: `41` events, `24` windows, `12` alerts, `summary.json` included
+
+Richer sample:
+
+- config: [`configs/richer_sample.yaml`](configs/richer_sample.yaml)
+- input: `data/raw/richer_sample_events.jsonl`
+- outputs: `data/processed/richer_sample/`
+- current summary: `28` events, `24` windows, `8` alerts, `summary.json` included
+
+## Input Support
+
+Runtime input support:
+
+- `.jsonl`
+- `.csv`
+
+Required fields for both formats on every row or record:
+
+- `timestamp`
+- `event_type`
+- `source`
+- `target`
+- `status`
+
+Cooldown behavior:
+
+- repeated alerts are keyed by `(rule_name, scope)`
+- scope prefers the first available entity-like field in this order: `entity`, `source`, `target`, `host`
+- when no entity-like field is present, cooldown falls back to per-`rule_name` behavior
+
+## Repo Guide
+
+- [`docs/sample-output.md`](docs/sample-output.md) summarizes the committed sample artifacts
+- [`docs/roadmap.md`](docs/roadmap.md) sketches the next demo directions
+- [`data/processed/summary.json`](data/processed/summary.json) captures the default run in machine-readable form
+- [`data/processed/richer_sample/summary.json`](data/processed/richer_sample/summary.json) captures the richer scenario pack
+- [`tests/`](tests/) keeps regression coverage close to the CLI behavior and windowing logic
+
+## Next Demo Directions
+
+- strengthen JSONL and CSV validation so ingestion failures are clearer
+- keep reducing repeated alert noise while preserving simple rule-based behavior
+- keep sample-output docs and public repo presentation aligned with the checked-in demo state
+
+## Scope
+
+This repository is a portfolio prototype, not a production monitoring system.
+
+## Limitations
+
+- No real-time ingestion
+- No streaming state management
+- No alert routing or case management
+- No dashboard or service deployment
+- Sample-data driven only
diff --git a/demos/ai-assisted-detection-demo/README.md b/demos/ai-assisted-detection-demo/README.md
@@ -0,0 +1,124 @@
+# AI-Assisted Detection Demo
+
+This demo is part of `telemetry-lab` and is intentionally framed as a portfolio-grade security engineering prototype.
+
+It demonstrates constrained AI-assisted case drafting for SOC-style workflows, not autonomous detection or response.
+
+It combines deterministic detections with a tightly constrained LLM stage:
+
+- the rules decide which activity is interesting
+- the grouping logic decides which hits belong in the same case
+- the LLM is limited to structured summaries, likely causes, uncertainty notes, and suggested next steps
+
+The LLM does **not** make final incident decisions, modify rules, call tools, or execute response actions. Human verification is always required.
+
+## Purpose
+
+The goal is to show a credible bridge between deterministic telemetry analytics and safe analyst assistance.
+
+This is not an autonomous SOC. It is a constrained drafting pipeline that keeps rule logic, ATT&CK mapping, case grouping, and evidence handling deterministic.
+
+## Pipeline
+
+1. ingest sample auth, web, and process events from JSONL
+2. normalize them into a shared internal schema
+3. apply deterministic detection rules
+4. group rule hits into cases by shared entities and time proximity
+5. attach ATT&CK mappings from rule metadata
+6. build a case bundle with raw evidence, rule hits, severity, and evidence highlights
+7. pass the case bundle to a constrained local demo LLM adapter with strict instruction and data separation
+8. require JSON-only output against a local schema
+9. validate the response and reject invalid output
+10. emit analyst-facing artifacts and audit traces
+
+## Guardrails
+
+- telemetry content is marked as untrusted data
+- system instructions are separated from the evidence payload
+- the response must pass local JSON schema validation
+- the response must pass a semantic validation layer after schema validation
+- `human_verification` is required and must be `required`
+- no external tool use is allowed in the LLM stage
+- no automated response actions are allowed
+- forbidden action-taking or final-verdict language is rejected and recorded
+- summaries are rejected if the returned `case_id` does not exactly match the input case bundle
+- a prompt-injection-like sample event is included and treated as telemetry, not instruction
+- rejected summaries are fail-closed: they do not enter `case_summaries.json`
+- accepted and rejected outcomes are both recorded in `audit_traces.jsonl`
+
+## Quick start
+
+From the repository root:
+
+```bash
+python -m pip install -e .
+python -m telemetry_window_demo.cli run-ai-demo
+```
+
+Generated artifacts are written to `demos/ai-assisted-detection-demo/artifacts/`.
+
+## Demo inputs
+
+- sample data: `data/raw/sample_security_events.jsonl`
+- deterministic rules: `config/rules.yaml`
+- structured output schema: `config/llm_case_output_schema.json`
+
+## Expected artifacts
+
+- `artifacts/rule_hits.json`
+- `artifacts/case_bundles.json`
+- `artifacts/case_summaries.json`
+- `artifacts/case_report.md`
+- `artifacts/audit_traces.jsonl`
+
+The bundled sample data is designed to produce at least three generated cases.
+
+## Artifact semantics
+
+- `rule_hits.json`: deterministic rule hits with rule metadata, ATT&CK mapping, entities, and evidence highlights
+- `case_bundles.json`: grouped cases with severity, rule hits, ATT&CK mappings, raw evidence, and untrusted-data marking
+- `case_summaries.json`: only accepted JSON summaries that passed schema and semantic validation
+- `case_report.md`: analyst-facing report that shows accepted summaries and explicitly notes rejected case summaries
+- `case_report.md`: includes a top-level run integrity section that surfaces rule/config degradation
+- `audit_traces.jsonl`: stable per-record audit log for accepted and rejected paths, using `schema_version = ai-assisted-detection-audit/v1` and including `ts`, `case_id`, `validation_status`, `rejection_reason`, `rule_ids`, `prompt_input_digest`, `evidence_digest`, and bounded response excerpts
+
+## Rejection behavior
+
+- non-JSON or malformed JSON responses are rejected and recorded
+- missing required fields or invalid enum values are rejected and recorded
+- schema-valid summaries with the wrong `case_id` are rejected and recorded
+- action-taking language is rejected
+- final-verdict or confirmed-compromise language is rejected
+- malformed rule or ATT&CK metadata is rejected before detection logic uses it
+
+Rejected outputs do not become analyst summaries. Analysts can still inspect deterministic evidence through `case_bundles.json`, `case_report.md`, and `audit_traces.jsonl`.
+
+## Reviewer walkthrough
+
+### Accepted summary path
+
+Use the default sample run artifacts in `artifacts/case_summaries.json`, `artifacts/case_report.md`, and `artifacts/audit_traces.jsonl`.
+
+Verify that `CASE-001` appears in all three places, that the `case_id` matches exactly, that `human_verification` is `required`, and that the audit record shows `validation_status = accepted` with `schema_version = ai-assisted-detection-audit/v1`.
+
+### Rejected summary path
+
+Run `pytest tests/test_ai_assisted_detection_demo.py -k "audit_traces_capture_accepted_and_rejected_paths or case_id_mismatch"` and inspect the `case_report.md`, `case_summaries.json`, and `audit_traces.jsonl` artifacts written by the test.
+
+Verify that the rejected case is absent from `case_summaries.json`, appears in `case_report.md` as `Summary status: rejected`, and has an audit record with `validation_status = rejected` plus a concrete `rejection_reason` such as `missing_required_fields`, `semantic_validation_failed`, or `case_id_mismatch`.
+
+### Degraded coverage path
+
+Run `pytest tests/test_ai_assisted_detection_demo.py -k malformed_attack_metadata_is_rejected_and_recorded` and inspect the generated `case_report.md` and `audit_traces.jsonl`.
+
+Verify that `case_report.md` exposes `## Run Integrity`, `coverage_degraded: yes`, and the rejected rule id, and that `audit_traces.jsonl` contains a global rejection record with `case_id = null` and `rejection_reason = rule_metadata_validation_failed`.
+
+## Limitations
+
+- the LLM stage is a constrained local demo adapter, not a production model integration
+- detections are intentionally small and rule-based
+- grouping is simple and optimized for readability over recall
+- sample telemetry is synthetic and limited in volume
+- there is no ticketing, SOAR, sandboxing, or live data ingestion
+- artifacts are for analyst review only and do not represent final incident disposition
+- rejection logic is intentionally conservative and favors fail-closed behavior over model flexibility
diff --git a/demos/ai-assisted-detection-demo/artifacts/.gitkeep b/demos/ai-assisted-detection-demo/artifacts/.gitkeep
@@ -0,0 +1 @@
+