Skip to content

Robust LLM output parsing with fallback strategies #28

@haasonsaas

Description

@haasonsaas

Problem

LLMs frequently produce malformed structured output. Qodo Merge has 9 sequential fallback strategies for parsing YAML output from LLMs — this is a hard-won production lesson. DiffScope parses LLM output via parsing/ modules but may not be as resilient to malformed responses.

How Qodo Does It

From pr_agent/algo/utils.py, function load_yaml() with try_fix_yaml():

  1. Direct yaml.safe_load() — the happy path
  2. YAML literal block (|-) conversion
  3. Pipe character replacement (||2)
  4. Root-level indentation fixes
  5. Snippet extraction between backticks (```yaml ... ```)
  6. Curly bracket removal (JSON-like output from LLM)
  7. Key-range extraction (grab just the relevant YAML section)
  8. Leading + character removal (diff artifacts bleeding into output)
  9. Tab-to-space conversion and encoding fallbacks (latin-1, utf-16)

Proposed Solution

Audit and harden DiffScope's LLM output parsing:

1. Audit Current Parsing

  • Review parsing/llm_response.rs and parsing/smart_review_response.rs
  • Identify failure modes from production use
  • Add error tracking to measure parse failure rates

2. Implement Fallback Chain

fn parse_llm_response(raw: &str) -> Result<Vec<Comment>> {
    // Strategy 1: Direct JSON parse
    if let Ok(comments) = serde_json::from_str(raw) { return Ok(comments); }
    
    // Strategy 2: Extract JSON from markdown code blocks
    if let Some(json_block) = extract_code_block(raw, "json") {
        if let Ok(comments) = serde_json::from_str(&json_block) { return Ok(comments); }
    }
    
    // Strategy 3: Fix common JSON issues (trailing commas, single quotes)
    let fixed = fix_common_json_issues(raw);
    if let Ok(comments) = serde_json::from_str(&fixed) { return Ok(comments); }
    
    // Strategy 4: Extract individual comment objects with regex
    if let Ok(comments) = extract_comments_regex(raw) { return Ok(comments); }
    
    // Strategy 5: Line-by-line structured extraction
    if let Ok(comments) = parse_structured_text(raw) { return Ok(comments); }
    
    // Strategy 6: Ask the LLM to reformat (last resort)
    Err(anyhow!("Failed to parse LLM output after all strategies"))
}

3. Track Parse Failures

  • Log parse failure rate per strategy
  • Surface in metrics/analytics
  • Feed back into prompt engineering (if certain models consistently produce bad output)

Priority

Medium — production reliability. Low effort, high resilience. Every tool in the space has learned this lesson the hard way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions