[Feature Request] Add a small "FreshStack RAG evaluation failure modes" checklist doc (docs only)

Hi FreshStack team,

Thank you for releasing FreshStack. A benchmark and framework for RAG over technical documentation is extremely useful for both research and industry.

I have been working on 16-mode failure maps for RAG systems and recently contributed a robustness-related entry to Harvard MIMS Lab’s ToolUniverse. In FreshStack-style settings, I often see repeated issues:

- retrieval that focuses on popular pages rather than the correct ones
- confusion between similar APIs or versions in the documentation
- answer evaluation that does not fully reflect grounding in the retrieved docs
- experiments that are hard to reproduce because configuration details are not recorded

I would like to propose a small, documentation-only evaluation checklist for FreshStack users.

### Proposed feature

Add a short markdown page under the repo, for example:

`freshstack_rag_evaluation_failure_modes_and_checklist.md`

The page could:

1. List typical RAG failure modes specific to technical docs (API confusion, versioning, incomplete snippets).
2. For each, describe:
   - symptoms in FreshStack evaluations
   - likely causes (retrieval settings, corpus preparation, query formulation).
3. Provide a short checklist for running and reporting FreshStack experiments:
   - corpus version, retrieval configuration, model, and key evaluation settings.

### Motivation

- FreshStack is likely to become a standard reference for technical-doc RAG.
- A small failure-mode and reporting checklist would help ensure that evaluations are interpretable and comparable across systems.
- This is a docs-only change and should be straightforward to maintain.

If this is aligned with your goals for FreshStack, I would be glad to propose a concise initial draft in a PR.

Thank you for considering.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add a small "FreshStack RAG evaluation failure modes" checklist doc (docs only) #9

Proposed feature

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add a small "FreshStack RAG evaluation failure modes" checklist doc (docs only) #9

Description

Proposed feature

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions