Skip to content

AST-based structural pattern matching (ast-grep integration) #31

@haasonsaas

Description

@haasonsaas

Problem

CodeRabbit uses ast-grep for structural AST pattern matching alongside LLM review. Their open-source ast-grep-essentials repo has 128 rules covering security, best practices, and anti-patterns. This catches structural issues (missing error handling, deprecated API usage, unsafe patterns) deterministically — no LLM tokens required, no hallucination risk.

DiffScope has a Semgrep plugin, but ast-grep is faster, lighter, and has a growing community rule collection.

How CodeRabbit Does It

  • ast-grep runs in-sandbox alongside 40+ other tools
  • Rules are YAML-based structural patterns matching on ASTs
  • Results are injected into the LLM review prompt as additional context
  • The LLM uses ast-grep findings to inform its review (avoids re-discovering known patterns)
  • Community rules at coderabbitai/ast-grep-essentials:
    • Security: JWT without verification, SQL injection patterns, hardcoded secrets
    • Best practices: missing error handling, deprecated APIs, unsafe type assertions
    • Language-specific: React hooks violations, Go error ignoring, Python anti-patterns

Proposed Solution

Phase 1: ast-grep Plugin

Add ast-grep as a pre-analyzer plugin (like existing ESLint/Semgrep plugins):

pub struct AstGrepPlugin {
    rules_dir: PathBuf,
    languages: Vec<String>,
}

#[async_trait]
impl PreAnalyzer for AstGrepPlugin {
    async fn analyze(&self, diffs: &[UnifiedDiff]) -> Result<Vec<PreAnalysis>> {
        // Run: ast-grep scan --rule <rules_dir> --json <changed_files>
        // Parse JSON output into PreAnalysis findings
        // Map to file/line positions in the diff
    }
}

Phase 2: Bundled Rules

  • Bundle coderabbitai/ast-grep-essentials rules (or maintain our own)
  • Support custom rules in .diffscope/ast-grep-rules/
  • Organize by: security, correctness, style, performance

Phase 3: LLM Context Integration

  • Inject ast-grep findings into the review prompt
  • The LLM can reference, explain, or override ast-grep findings
  • Deterministic findings don't need LLM verification — post directly
  • This reduces the load on the LLM (fewer patterns it needs to catch)

Configuration

plugins:
  ast_grep:
    enabled: true
    rules_dir: .diffscope/ast-grep-rules  # custom rules
    bundled_rules: true  # use built-in rule collection
    languages: [rust, typescript, python, go]
    post_direct: true  # post deterministic findings without LLM review

Why ast-grep Over Semgrep

  • Speed: ast-grep is Rust-based, significantly faster than Semgrep
  • No account/login: Semgrep registry requires authentication; ast-grep rules are plain YAML
  • Simpler rule format: YAML pattern matching on AST nodes
  • Growing ecosystem: ast-grep-essentials is actively maintained
  • Both can coexist — Semgrep for deeper security analysis, ast-grep for fast structural checks

Priority

Medium — deterministic quality floor. Catches structural issues without LLM cost or hallucination risk. Complements the LLM review rather than competing with it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions