Embedding-based false positive filtering from developer feedback

## Problem

DiffScope has a convention learner with Wilson score confidence intervals — statistically more rigorous than competitors. But Greptile's embedding-based approach to false positive filtering is empirically the most effective technique published in the space, taking their comment address rate from 19% to 55%+.

## How Greptile Does It (from their "Make LLMs Shut Up" blog)

**What failed:**
- **Prompt engineering / few-shot**: Model "inferred superficial characteristics" rather than learning meaningful patterns. Backfired.
- **LLM-as-judge**: A secondary LLM rating comments 1-10 was "nearly random in its judgment of its own output"

**What works:**
1. Store **embeddings** of all past review comments, tagged with developer 👍/👎 feedback
2. For each new comment the LLM wants to post:
   - Compute cosine similarity against the feedback database
   - **Block** if similar to 3+ distinct downvoted comments
   - **Pass** if similar to 3+ upvoted comments
   - **Pass** ambiguous cases (not enough signal)
3. Result: address rate went from 19% to 55%+

**Key insight**: "Nits are subjective — definitions and standards vary from team to team." This must be learned **per-team**, not universally.

## Proposed Solution

Enhance the existing feedback system (`FeedbackStore`) with embedding-based similarity:

### Data Model
```sql
CREATE TABLE review_feedback (
    id SERIAL PRIMARY KEY,
    repo TEXT NOT NULL,
    comment_text TEXT NOT NULL,
    comment_embedding vector(1536),
    category TEXT,  -- logic, style, security, etc.
    file_pattern TEXT,  -- e.g., "*.rs", "src/api/**"
    feedback TEXT NOT NULL,  -- 'accepted' or 'rejected'
    created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON review_feedback USING ivfflat (comment_embedding vector_cosine_ops);
```

### Filtering Logic
```rust
async fn should_post_comment(
    comment: &Comment,
    feedback_db: &FeedbackDb,
    threshold: usize,  // default 3
    similarity_cutoff: f32,  // default 0.85
) -> bool {
    let embedding = embed(comment.text()).await?;
    let similar = feedback_db.find_similar(embedding, similarity_cutoff).await?;
    
    let rejected = similar.iter().filter(|f| f.feedback == "rejected").count();
    let accepted = similar.iter().filter(|f| f.feedback == "accepted").count();
    
    if rejected >= threshold { return false; }  // block
    if accepted >= threshold { return true; }   // pass
    true  // ambiguous → pass (err on side of posting)
}
```

### Feedback Collection
- `diffscope feedback accept <comment-id>` — existing CLI, add embedding storage
- `diffscope feedback reject <comment-id>` — existing CLI, add embedding storage
- GitHub reactions (👍/👎) on posted PR comments → auto-collect via webhook
- Resolved/unresolved thread status → signal for accepted/rejected

### Relationship to Existing Convention Learner
- The Wilson score convention learner operates on **exact pattern matches** (rule_id, file pattern, category)
- Embedding-based filtering operates on **semantic similarity** of the comment text
- Both should run: Wilson score for structured rules, embeddings for fuzzy/subjective nits
- The embedding filter runs first (cheap vector lookup), Wilson score augments

## Expected Impact

Greptile's published numbers: 19% → 55%+ address rate. Even half that improvement would be significant for DiffScope's signal-to-noise ratio.

## Priority

**High — direct attack on the #1 churn driver** (review fatigue from noisy comments).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding-based false positive filtering from developer feedback #27

Problem

How Greptile Does It (from their "Make LLMs Shut Up" blog)

Proposed Solution

Data Model

Filtering Logic

Feedback Collection

Relationship to Existing Convention Learner

Expected Impact

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Embedding-based false positive filtering from developer feedback #27

Description

Problem

How Greptile Does It (from their "Make LLMs Shut Up" blog)

Proposed Solution

Data Model

Filtering Logic

Feedback Collection

Relationship to Existing Convention Learner

Expected Impact

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions