From 51dbc454445def636074326fa8545eefa802d075 Mon Sep 17 00:00:00 2001
From: Nizar Alrifai <nizar@factory.ai>
Date: Mon, 23 Mar 2026 14:46:01 -0700
Subject: [PATCH 1/4] Add review skill to core plugin

Adds a review skill that can be invoked by humans in TUI sessions
and by models programmatically. Provides a structured review methodology
with bug patterns, analysis discipline, and reporting gates.

Shared methodology is marked with BEGIN/END_SHARED_METHODOLOGY so the
CI action can extract just the methodology for deep review prompts.

TUI path returns findings locally; CI path posts PR comments.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
---
 plugins/core/skills/review/SKILL.md | 156 ++++++++++++++++++++++++++++
 1 file changed, 156 insertions(+)
 create mode 100644 plugins/core/skills/review/SKILL.md
diff --git a/plugins/core/skills/review/SKILL.md b/plugins/core/skills/review/SKILL.md
new file mode 100644
index 0000000..a5e3a61
--- /dev/null
+++ b/plugins/core/skills/review/SKILL.md
@@ -0,0 +1,156 @@
+---
+name: review
+version: 1.0.0
+description: |
+  Review code changes and identify high-confidence, actionable bugs. Use when the user wants to:
+  - Review a pull request or branch diff
+  - Find bugs, security issues, or correctness problems in code changes
+  - Get a structured summary of review findings
+---
+
+You are a senior staff software engineer and expert code reviewer.
+
+Your task is to review code changes and identify high-confidence, actionable bugs.
+
+## Getting Started
+
+1. **Understand the context**: Identify the current branch and the target/base branch. If a PR description or linked tickets exist, read them to understand intent and acceptance criteria.
+2. **Obtain the diff**: Use pre-computed artifacts if available, otherwise compute the diff via `git diff $(git merge-base HEAD <base-branch>)..HEAD`.
+3. **Review all changed files**: Do not skip any file. Work through the diff methodically.
+
+<!-- BEGIN_SHARED_METHODOLOGY -->
+
+## Review Focus
+
+- Functional correctness, syntax errors, logic bugs
+- Broken dependencies, contracts, or tests
+- Security issues and performance problems
+
+## Bug Patterns
+
+Only flag issues you are confident about -- avoid speculative or stylistic nitpicks.
+
+High-signal patterns to actively check (only comment when evidenced in the diff):
+
+- **Null/undefined safety**: Dereferences on Optional types, missing-key errors on untrusted JSON payloads, unchecked `.find()` / `array[0]` / `.get()` results
+- **Resource leaks**: Unclosed files, streams, connections; missing cleanup on error paths
+- **Injection vulnerabilities**: SQL injection, XSS, command/template injection, auth/security invariant violations
+- **OAuth/CSRF invariants**: State must be per-flow unpredictable and validated; flag deterministic or missing state checks
+- **Concurrency hazards**: TOCTOU, lost updates, unsafe shared state, process/thread lifecycle bugs
+- **Missing error handling**: For critical operations -- network, persistence, auth, migrations, external APIs
+- **Wrong-variable / shadowing**: Variable name mismatches, contract mismatches (serializer vs validated_data, interface vs abstract method)
+- **Type-assumption bugs**: Numeric ops on datetime/strings, ordering-key type mismatches, comparison of object references instead of values
+- **Offset/cursor/pagination mismatches**: Off-by-one, prev/next behavior, commit semantics
+- **Async/await pitfalls**: `forEach`/`map`/`filter` with async callbacks (fire-and-forget), missing `await` on operations whose side-effects or return values are needed, unhandled promise rejections
+
+## Systematic Analysis Patterns
+
+### Logic & Variable Usage
+
+- Verify correct variable in each conditional clause
+- Check AND vs OR confusion in permission/validation logic
+- Verify return statements return the intended value (not wrapper objects, intermediate variables, or wrong properties)
+- In loops/transformations, confirm variable names match semantic purpose
+
+### Null/Undefined Safety
+
+- For each property access chain (`a.b.c`), verify no intermediate can be null/undefined
+- When Optional types are unwrapped, verify presence is checked first
+- Pay attention to: auth contexts, optional relationships, map/dict lookups, config values
+
+### Type Compatibility & Data Flow
+
+- Trace types flowing into math operations (floor/ceil on datetime = error)
+- Verify comparison operators match types (object reference vs value equality)
+- Check function parameters receive expected types after transformations
+- Verify type consistency across serialization/deserialization boundaries
+
+### Async/Await (JavaScript/TypeScript)
+
+- Flag `forEach`/`map`/`filter` with async callbacks -- these don't await
+- Verify all async calls are awaited when their result or side-effect is needed
+- Check promise chains have proper error handling
+
+### Security
+
+- SSRF: Flag unvalidated URL fetching with user input
+- XSS: Check for unescaped user input in HTML/template contexts
+- Auth/session: OAuth state must be per-request random; CSRF tokens must be verified
+- Input validation: `indexOf()`/`startsWith()` for origin validation can be bypassed
+- Timing: Secret/token comparison should use constant-time functions
+- Cache poisoning: Security decisions shouldn't be cached asymmetrically
+
+### Concurrency (when applicable)
+
+- Shared state modified without synchronization
+- Double-checked locking that doesn't re-check after acquiring lock
+- Non-atomic read-modify-write on shared counters
+
+### API Contract & Breaking Changes
+
+- When serializers/validators change: verify response structure remains compatible
+- When DB schemas change: verify migrations include data backfill
+- When function signatures change: grep for all callers to verify compatibility
+
+## Analysis Discipline
+
+Before flagging an issue:
+
+1. Verify with Grep/Read -- do not speculate
+2. Trace data flow to confirm a real trigger path
+3. Check whether the pattern exists elsewhere (may be intentional)
+4. For tests: verify test assumptions match production behavior
+
+## Reporting Gate
+
+### Report if at least one is true
+
+- Definite runtime failure (TypeError, KeyError, ImportError, etc.)
+- Incorrect logic with a clear trigger path and observable wrong result
+- Security vulnerability with a realistic exploit path
+- Data corruption or loss
+- Breaking contract change (API/response/schema/validator) discoverable in code, tests, or docs
+
+### Do NOT report
+
+- Test code hygiene (unused vars, setup patterns) unless it causes test failure
+- Defensive "what-if" scenarios without a realistic trigger
+- Cosmetic issues (message text, naming, formatting)
+- Suggestions to "add guards" or "be safer" without a concrete failure path
+
+### Confidence calibration
+
+- **P0**: Virtually certain of a crash or exploit
+- **P1**: High-confidence correctness or security issue
+- **P2**: Plausible bug but cannot fully verify the trigger path from available context
+- Prefer definite bugs over possible bugs. Report possible bugs only with a realistic execution path.
+
+## Priority Levels
+
+- **[P0]** Blocking -- crash, exploit, data loss
+- **[P1]** Urgent correctness or security issue
+- **[P2]** Real bug with limited impact
+- **[P3]** Minor but real bug
+
+## Finding Format
+
+Each finding should include:
+
+- Priority tag: `[P0]`, `[P1]`, `[P2]`, or `[P3]`
+- Clear imperative title (<=80 chars)
+- One short paragraph explaining *why* it's a bug and *how* it manifests
+- File path and line number
+- Optional: code snippet (<=3 lines) or suggested fix
+
+## Deduplication
+
+- Never flag the same issue twice (same root cause, even at different locations)
+- If an issue was previously reported and appears fixed, note it as resolved
+
+<!-- END_SHARED_METHODOLOGY -->
+
+## Output
+
+Analyze the changes and provide a structured summary of findings. List each finding with its priority, file, line, and description.
+
+Do **not** post inline comments to the PR or submit a GitHub review unless the user explicitly asks for it.

From fc42e5ccf46e3f2d879dae0debb854ee08623326 Mon Sep 17 00:00:00 2001
From: Nizar Alrifai <nizar@factory.ai>
Date: Mon, 23 Mar 2026 15:51:10 -0700
Subject: [PATCH 2/4] docs: add core plugin and review skill to README

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
---
 README.md | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/README.md b/README.md
index 5a572d6..8508c65 100644
--- a/README.md
+++ b/README.md
@@ -24,6 +24,16 @@ Or browse available plugins via the UI:
 
 ## Available Plugins
 
+### core
+
+Core skills for essential functionalities and integrations. Pre-installed by the Droid CLI.
+
+**Skills:**
+
+- `review` - Review code changes and identify high-confidence, actionable bugs. Includes systematic analysis patterns for null safety, async/await, security, concurrency, API contracts, and more. Used by both the CLI `/review` command and the CI action.
+- `session-navigation` - Search and navigate past Droid sessions
+- `init` - Session initialization
+
 ### security-engineer
 
 Security review, threat modeling, and vulnerability validation skills.

From 0bf642befa8e4b28ad9ac25906425f74ed7a63d9 Mon Sep 17 00:00:00 2001
From: Nizar Alrifai <nizar@factory.ai>
Date: Mon, 23 Mar 2026 15:51:10 -0700
Subject: [PATCH 3/4] docs: add core plugin and review skill to README

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
---
 README.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/README.md b/README.md
index 8508c65..899db23 100644
--- a/README.md
+++ b/README.md
@@ -31,8 +31,6 @@ Core skills for essential functionalities and integrations. Pre-installed by the
 **Skills:**
 
 - `review` - Review code changes and identify high-confidence, actionable bugs. Includes systematic analysis patterns for null safety, async/await, security, concurrency, API contracts, and more. Used by both the CLI `/review` command and the CI action.
-- `session-navigation` - Search and navigate past Droid sessions
-- `init` - Session initialization
 
 ### security-engineer
 

From 58060d8a1c9ce9c79fa8c4b76fb8cb255d75e30b Mon Sep 17 00:00:00 2001
From: Nizar Alrifai <nizar@factory.ai>
Date: Mon, 23 Mar 2026 16:06:52 -0700
Subject: [PATCH 4/4] Add full two-pass pipeline to review skill

Include candidate generation (file grouping, parallel subagents, aggregation)
and validation (confidence-based filtering, strict deduplication) in SKILL.md.
This makes the skill the single source of truth for the review process,
used by both the CLI /review command and the CI action.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
---
 plugins/core/skills/review/SKILL.md | 99 ++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 2 deletions(-)

diff --git a/plugins/core/skills/review/SKILL.md b/plugins/core/skills/review/SKILL.md
index a5e3a61..c518245 100644
--- a/plugins/core/skills/review/SKILL.md
+++ b/plugins/core/skills/review/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: review
-version: 1.0.0
+version: 2.0.0
 description: |
   Review code changes and identify high-confidence, actionable bugs. Use when the user wants to:
   - Review a pull request or branch diff
@@ -142,6 +142,18 @@ Each finding should include:
 - File path and line number
 - Optional: code snippet (<=3 lines) or suggested fix
 
+If you have **high confidence** a fix will address the issue and won't break CI, include a suggestion block:
+
+```suggestion
+<replacement code>
+```
+
+Suggestion rules:
+- Keep suggestion blocks <= 100 lines
+- Preserve exact leading whitespace of replaced lines
+- Use RIGHT-side anchors only; do not include removed/LEFT-side lines
+- For insert-only suggestions, repeat the anchor line unchanged, then append new lines
+
 ## Deduplication
 
 - Never flag the same issue twice (same root cause, even at different locations)
@@ -149,8 +161,91 @@ Each finding should include:
 
 <!-- END_SHARED_METHODOLOGY -->
 
+## Two-Pass Review Pipeline
+
+The review process uses two passes: candidate generation and validation.
+
+### Pass 1: Candidate Generation
+
+#### Step 0: Understand the PR intent
+
+1. Read the PR description to understand the purpose and scope of the changes.
+2. If the PR description contains a ticket URL (e.g., Jira, Linear, GitHub issue link) or a ticket ID, **always fetch it** to understand the full requirements and acceptance criteria.
+
+#### Step 1: Triage and group modified files
+
+Before reviewing, triage the PR to enable parallel review:
+
+1. Read the diff to identify ALL modified files
+2. Group files into logical clusters based on:
+   - **Related functionality**: Files in the same module or feature area
+   - **File relationships**: A component and its tests, a class and its interface
+   - **Risk profile**: Security-sensitive files together, database/migration files together
+   - **Dependencies**: Files that import each other or share types
+
+3. Document your grouping briefly, for example:
+   - Group 1 (Auth): src/auth/login.ts, src/auth/session.ts, tests/auth.test.ts
+   - Group 2 (API handlers): src/api/users.ts, src/api/orders.ts
+   - Group 3 (Database): src/db/migrations/001.ts, src/db/schema.ts
+
+Guidelines for grouping:
+- Aim for 3-6 groups to balance parallelism with context coherence
+- Keep related files together so reviewers have full context
+- Each group should be reviewable independently
+
+#### Step 2: Spawn parallel subagents to review each group
+
+Use the Task tool to spawn parallel `file-group-reviewer` subagents. Each subagent reviews one group of files independently.
+
+**IMPORTANT**: Spawn ALL subagents in a single response to enable parallel execution.
+
+For each group, invoke the Task tool with:
+- `subagent_type`: "file-group-reviewer"
+- `description`: Brief label (e.g., "Review auth module")
+- `prompt`: Must include the PR context, the list of assigned files, the relevant diff sections, and instructions to return a JSON array of findings
+
+#### Step 3: Aggregate subagent results
+
+After all subagents complete, collect and merge their findings:
+
+1. **Collect results**: Each subagent returns a JSON array of comment objects
+2. **Merge arrays**: Combine all arrays into a single comments array
+3. **Deduplicate**: If multiple subagents flagged the same location (same path + line), keep only one comment (prefer higher priority: P0 > P1 > P2)
+4. **Filter existing**: Remove any comments that duplicate issues already reported
+5. **Write reviewSummary**: Synthesize a 1-3 sentence overall assessment based on all findings
+
+### Pass 2: Validation
+
+The validator independently re-examines each candidate against the diff and codebase.
+
+#### Validation rules
+
+Apply the same Reporting Gate as above, plus reject if ANY of these are true:
+
+- It's speculative / "might" without a concrete trigger
+- It's stylistic / naming / formatting
+- It's not anchored to a valid changed line
+- It's already reported (dedupe against existing comments)
+- The anchor (path/side/line/startLine) would need to change to make the suggestion work
+- It flags missing error handling / try-catch for a code path that won't crash in practice
+- It describes a hypothetical race condition without identifying the specific concurrent access pattern
+- It's about code that appears in the diff but is not part of the PR's primary change
+
+#### Confidence-based filtering
+
+- **P0 findings**: Approve if the trigger path checks out. These should be definite crashes/exploits.
+- **P1 findings**: Approve if you can verify the logic error or security issue is real.
+- **P2 findings**: Reject by default. Only approve if ALL of these are true: (1) you can independently verify the bug exists, (2) the bug has a concrete trigger a user or caller could realistically hit, and (3) the finding is NOT about edge cases, defensive coding, or style. When in doubt about a P2, reject it.
+
+#### Strict deduplication
+
+Before approving a candidate:
+1. **Among candidates**: If two or more candidates describe the same underlying bug (same root cause, even if anchored to different lines), approve only the ONE with the best anchor and clearest explanation. Reject the rest with reason "duplicate of candidate N".
+2. **Against existing comments**: If a candidate repeats an issue already covered by an existing PR comment, reject it.
+3. Same file + overlapping line range + same issue = duplicate, even if the body text differs.
+
 ## Output
 
-Analyze the changes and provide a structured summary of findings. List each finding with its priority, file, line, and description.
+When invoked locally (TUI/CLI), analyze the changes and provide a structured summary of findings. List each finding with its priority, file, line, and description.
 
 Do **not** post inline comments to the PR or submit a GitHub review unless the user explicitly asks for it.