-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hi FreshStack team,
Thank you for releasing FreshStack. A benchmark and framework for RAG over technical documentation is extremely useful for both research and industry.
I have been working on 16-mode failure maps for RAG systems and recently contributed a robustness-related entry to Harvard MIMS Lab’s ToolUniverse. In FreshStack-style settings, I often see repeated issues:
- retrieval that focuses on popular pages rather than the correct ones
- confusion between similar APIs or versions in the documentation
- answer evaluation that does not fully reflect grounding in the retrieved docs
- experiments that are hard to reproduce because configuration details are not recorded
I would like to propose a small, documentation-only evaluation checklist for FreshStack users.
Proposed feature
Add a short markdown page under the repo, for example:
freshstack_rag_evaluation_failure_modes_and_checklist.md
The page could:
- List typical RAG failure modes specific to technical docs (API confusion, versioning, incomplete snippets).
- For each, describe:
- symptoms in FreshStack evaluations
- likely causes (retrieval settings, corpus preparation, query formulation).
- Provide a short checklist for running and reporting FreshStack experiments:
- corpus version, retrieval configuration, model, and key evaluation settings.
Motivation
- FreshStack is likely to become a standard reference for technical-doc RAG.
- A small failure-mode and reporting checklist would help ensure that evaluations are interpretable and comparable across systems.
- This is a docs-only change and should be straightforward to maintain.
If this is aligned with your goals for FreshStack, I would be glad to propose a concise initial draft in a PR.
Thank you for considering.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels