AST-based structural pattern matching (ast-grep integration)

## Problem

CodeRabbit uses **ast-grep** for structural AST pattern matching alongside LLM review. Their open-source `ast-grep-essentials` repo has 128 rules covering security, best practices, and anti-patterns. This catches structural issues (missing error handling, deprecated API usage, unsafe patterns) deterministically — no LLM tokens required, no hallucination risk.

DiffScope has a Semgrep plugin, but ast-grep is faster, lighter, and has a growing community rule collection.

## How CodeRabbit Does It

- **ast-grep** runs in-sandbox alongside 40+ other tools
- Rules are YAML-based structural patterns matching on ASTs
- Results are injected into the LLM review prompt as additional context
- The LLM uses ast-grep findings to inform its review (avoids re-discovering known patterns)
- Community rules at `coderabbitai/ast-grep-essentials`:
  - Security: JWT without verification, SQL injection patterns, hardcoded secrets
  - Best practices: missing error handling, deprecated APIs, unsafe type assertions
  - Language-specific: React hooks violations, Go error ignoring, Python anti-patterns

## Proposed Solution

### Phase 1: ast-grep Plugin
Add ast-grep as a pre-analyzer plugin (like existing ESLint/Semgrep plugins):

```rust
pub struct AstGrepPlugin {
    rules_dir: PathBuf,
    languages: Vec<String>,
}

#[async_trait]
impl PreAnalyzer for AstGrepPlugin {
    async fn analyze(&self, diffs: &[UnifiedDiff]) -> Result<Vec<PreAnalysis>> {
        // Run: ast-grep scan --rule <rules_dir> --json <changed_files>
        // Parse JSON output into PreAnalysis findings
        // Map to file/line positions in the diff
    }
}
```

### Phase 2: Bundled Rules
- Bundle `coderabbitai/ast-grep-essentials` rules (or maintain our own)
- Support custom rules in `.diffscope/ast-grep-rules/`
- Organize by: security, correctness, style, performance

### Phase 3: LLM Context Integration
- Inject ast-grep findings into the review prompt
- The LLM can reference, explain, or override ast-grep findings
- Deterministic findings don't need LLM verification — post directly
- This reduces the load on the LLM (fewer patterns it needs to catch)

### Configuration
```yaml
plugins:
  ast_grep:
    enabled: true
    rules_dir: .diffscope/ast-grep-rules  # custom rules
    bundled_rules: true  # use built-in rule collection
    languages: [rust, typescript, python, go]
    post_direct: true  # post deterministic findings without LLM review
```

## Why ast-grep Over Semgrep

- **Speed**: ast-grep is Rust-based, significantly faster than Semgrep
- **No account/login**: Semgrep registry requires authentication; ast-grep rules are plain YAML
- **Simpler rule format**: YAML pattern matching on AST nodes
- **Growing ecosystem**: ast-grep-essentials is actively maintained
- Both can coexist — Semgrep for deeper security analysis, ast-grep for fast structural checks

## Priority

**Medium — deterministic quality floor.** Catches structural issues without LLM cost or hallucination risk. Complements the LLM review rather than competing with it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AST-based structural pattern matching (ast-grep integration) #31

Problem

How CodeRabbit Does It

Proposed Solution

Phase 1: ast-grep Plugin

Phase 2: Bundled Rules

Phase 3: LLM Context Integration

Configuration

Why ast-grep Over Semgrep

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AST-based structural pattern matching (ast-grep integration) #31

Description

Problem

How CodeRabbit Does It

Proposed Solution

Phase 1: ast-grep Plugin

Phase 2: Bundled Rules

Phase 3: LLM Context Integration

Configuration

Why ast-grep Over Semgrep

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions