Skip to content

Verification pass to catch hallucinations #23

@haasonsaas

Description

@haasonsaas

Problem

LLMs hallucinate. They suggest fixes for code that doesn't exist, misread line numbers, and flag non-issues. CodeRabbit's primary anti-hallucination defense is a dedicated Verification Agent — a second LLM pass that validates the Review Agent's output against the actual code. Qodo uses a similar "self-reflection" mechanism for code suggestions.

DiffScope has confidence scoring but no verification pass.

How Competitors Do It

CodeRabbit — Verification Agent

  • A separate agent that runs after the Review Agent
  • Receives: the review comments + the actual code
  • Validates: does each comment accurately reference real code? Is the suggestion correct? Is the line number right?
  • Suppresses hallucinated suggestions before they're posted
  • This is their "secret weapon" for false-positive reduction beyond learnings

Qodo Merge — Self-Reflection

  • Two-pass architecture (only for code suggestions):
    1. Pass 1: Generate suggestions with labels and code blocks
    2. Pass 2: Send all suggestions + original diff to a reasoning model
  • The reflection model:
    • Verifies existing_code actually matches the PR diff
    • Verifies improved_code accurately reflects the fix
    • Checks contextual accuracy beyond specified lines
    • Scores each suggestion 0-10
  • Scoring guidelines:
    • 8-10: Critical bugs/security
    • 3-7: Minor issues
    • Auto-0: Docstrings, type hints, unused imports (noise)
    • Capped at 7: Verification-only suggestions
  • Suggestions below threshold are dropped

Greptile — Confidence Thresholding

  • The agent maintains an increasing "threshold for sureness"
  • Lower-confidence observations get eliminated through the agentic loop
  • The agent challenges its own hypotheses by searching for counter-evidence

Proposed Solution

Add a verification pass after the review LLM generates findings:

Implementation

async fn verify_findings(
    findings: Vec<Comment>,
    diff: &UnifiedDiff,
    context: &ReviewContext,
    config: &ModelConfig,
) -> Vec<Comment> {
    // Build verification prompt with each finding + the actual code
    // Ask the model: "For each finding, verify:
    //   1. Does the referenced code actually exist at the specified line?
    //   2. Is the issue description accurate?
    //   3. Is the suggested fix correct and complete?
    //   4. Score confidence 0-10"
    // Filter findings below threshold
}

Verification Prompt Structure

You are a code review verifier. For each review finding below, verify it against the actual code and score its accuracy 0-10.

## Finding 1
- File: src/api/handler.rs:42
- Issue: "Missing null check on user input"
- Suggested fix: ...

## Actual Code (src/api/handler.rs:35-50)
[actual code from the file]

For each finding, respond:
- accurate: true/false (does the issue actually exist?)
- line_correct: true/false (is the line reference right?)
- fix_correct: true/false (would the suggested fix work?)
- score: 0-10
- reason: why this score

Model Selection

  • Use the same model as the review for consistency
  • Or use a cheaper model (Haiku/GPT-4o-mini) since verification is simpler than generation
  • Configurable: verification_model in config

Configuration

verification:
  enabled: true  # default true
  model: null  # null = use review model
  min_score: 5  # drop findings below this
  auto_zero:
    - docstrings
    - type_hints
    - import_ordering

Expected Impact

  • CodeRabbit attributes much of their review quality to this verification step
  • Qodo's self-reflection typically eliminates 20-40% of generated suggestions as inaccurate
  • Combined with DiffScope's existing confidence scoring and convention learner, this should significantly reduce false positives

Priority

High — second highest-leverage improvement after RAG. Direct attack on false positive rate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions