SourceMind turns 1,000-page document sets into a question-answerable knowledge base with page-level citations — then augments answers with live PubMed, clinical trial, and legal research.
Built on Recursive Language Models (RLMs), not basic RAG. The RLM writes and executes Python code to navigate documents — handling inputs 2 orders of magnitude beyond typical context windows.
- Upload documents up to 1,000+ pages (PDF, TXT, MD, DOCX)
- Ask natural language questions across one or many documents
- Run medical-legal reviews — automated 5-step pipeline with per-facility map-reduce
- Search PubMed (36M+ citations), ClinicalTrials.gov, and case law (Midpage) mid-answer
- Get precise answers with
[Source: filename, Page N]citations — click to view the original passage - Track API costs per query in real time
- Med-legal expert witness: Full standard-of-care review across 1,000+ pages from multiple facilities
- Clinical research: Synthesize findings across uploaded docs + PubMed literature
- Clinical trial matching: Upload patient records, auto-extract diagnosis, find eligible trials
- Legal research: Search case law and analyze judicial opinions alongside medical records
- General knowledge work: Any professional who needs precision over large document sets
Notebook workspace — document panel with page counts, token totals, and conversation history
Query with inline citations — click any citation button to view the source passage
Citation viewer — original passage highlighted with page navigation
flowchart TB
subgraph Frontend["Frontend (React + TypeScript + Tailwind)"]
UI[Notebook Workspace]
DP[Document Panel]
CP[Chat Panel]
CV[Citation Viewer]
end
subgraph Backend["Backend (Python + FastAPI)"]
ING[Document Ingestion<br/>PDF · TXT · MD · DOCX]
CDI[Cross-Document Indexer<br/>classify · extract · timeline]
SR[Smart Router]
RLM[RLM Engine<br/>Root LM + Sub LM + REPL]
RP[Review Pipeline<br/>5-step med-legal analysis]
AH[Anti-Hallucination Stack<br/>refusal · consistency · temporal]
CT[Citation Extractor + Normalizer]
end
subgraph External["External Research"]
PM[PubMed<br/>36M+ citations]
CTG[ClinicalTrials.gov]
MP[Midpage Legal<br/>case law search]
end
UI --> ING
UI --> SR
SR -->|"< 150K tokens"| API[Direct Claude API]
SR -->|">= 150K tokens"| RLM
SR -->|"med-legal review"| RP
RP --> RLM
RLM --> PM & CTG & MP
RLM --> CT
RP --> AH
CDI --> RP
ING --> CDI
CT --> CV
| Context Size | Strategy | Why |
|---|---|---|
| < 150K tokens | Direct Claude API call | Faster, cheaper for shorter inputs |
| >= 150K tokens | RLM Engine (recursive navigation) | Agentic code-driven navigation across massive contexts |
| Medical-legal review | Review Pipeline (5-step) | Structured multi-pass analysis with per-facility map-reduce |
- Root LM (Claude Sonnet 4) examines document structure via generated Python code
- Sub LM (Claude Haiku 4.5) analyzes individual passages for semantic understanding
- External tools available in the REPL: PubMed, ClinicalTrials.gov, Midpage legal research
- Code execution filters, searches, and navigates — the model decides HOW to traverse
- All claims require citations, verified against source text
Input: "Did the patient meet the diagnostic criteria for incomplete Kawasaki disease at any facility visit?"
Output (excerpt): The patient presented with fever of 5 days duration and CRP of 8.2 mg/dL (Page 15, TGH_Record.pdf), meeting the entry criteria of CRP ≥3.0 mg/dL [FACT]. Of the 6 supplementary laboratory criteria, 3 were met: Albumin 2.8 g/dL ≤3.0 (Page 18), WBC 16,200/uL ≥15,000 (Page 15), and Urine WBC 12/HPF ≥10 (Page 22) [FACT]. The AHA guideline threshold of ≥3 of 6 criteria was reached [FACT]. CONFIDENCE: 8/10 (Citation: 9, Guideline: 9, Allegations: 7, Corroboration: 7)
Input: "Search PubMed for the latest meta-analyses on IV thrombolysis window extension beyond 4.5 hours"
Output (excerpt): A 2024 individual patient data meta-analysis (PMID: 38901847) pooling 6 RCTs (n=3,412) found that alteplase administered 4.5–9 hours after onset in patients selected by perfusion imaging had significantly better functional outcomes (mRS 0-1: OR 1.49, 95% CI 1.10–2.01) [PMID: 38901847].
- Python 3.11+
- Node.js 18+
- An Anthropic API key
git clone https://github.com/rdmgator12/SourceMind.git
cd SourceMindcd backend
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY
pip install -r requirements.lock # or requirements.txt for latest compatible
uvicorn app.main:app --reloadBackend runs at http://localhost:8000
cd frontend
npm install
npm run devFrontend runs at http://localhost:3000 (proxies API calls to backend)
cp backend/.env.example backend/.env
# Edit backend/.env with your API key
docker-compose up --buildApp available at http://localhost:3000
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
— | Your Anthropic API key (required) |
ROOT_MODEL |
claude-sonnet-4-20250514 |
Root LM for orchestration |
SUB_MODEL |
claude-haiku-4-5-20251001 |
Sub LM for recursive passage analysis |
RLM_ENVIRONMENT |
local |
REPL sandbox (local or docker) |
RLM_MAX_RECURSION_DEPTH |
3 |
Max recursive depth per query |
RLM_TIMEOUT_SECONDS |
900 |
Max execution time per query |
RLM_MAX_BUDGET_USD |
10.00 |
Max API spend per query |
RLM_PER_STEP_BUDGET_USD |
3.00 |
Max API spend per pipeline step |
MAX_FILE_SIZE_MB |
150 |
Upload size limit |
ALLOWED_EXTENSIONS |
pdf,txt,md,docx |
Accepted file types |
NCBI_API_KEY |
— | Optional NCBI key (increases PubMed rate limit to 10/sec) |
NCBI_EMAIL |
— | Optional email for NCBI API usage |
POST /api/notebooks Create notebook
GET /api/notebooks List notebooks
GET /api/notebooks/:id Get notebook
DELETE /api/notebooks/:id Delete notebook
POST /api/notebooks/:id/documents Upload document
GET /api/notebooks/:id/documents List documents
DELETE /api/notebooks/:id/documents/:did Remove document
GET /api/documents/:did/page/:page Get page text
POST /api/notebooks/:id/query Submit query (multi-source)
POST /api/notebooks/:id/review Run medical-legal review
GET /api/notebooks/:id/conversations List conversations
GET /api/conversations/:cid Get conversation
GET /api/stats Usage stats
WS /ws/query/:notebook_id Streaming query via WebSocket
WS /ws/review/:notebook_id Streaming review via WebSocket
Interactive API docs (Swagger UI) available at http://localhost:8000/docs when running locally.
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Tailwind CSS, Zustand |
| Backend | Python 3.12, FastAPI, SQLAlchemy, aiosqlite |
| RLM | rlms library (MIT), Anthropic backend |
| Document Parsing | PyMuPDF (PDF), python-docx (DOCX) |
| Literature Search | PubMed E-utilities (NCBI) |
| Trial Matching | ClinicalTrials.gov v2 API |
| Legal Research | Midpage Legal Research (MCP) |
| LLMs | Claude Sonnet 4 (root) + Haiku 4.5 (sub) |
| Deploy | Docker Compose, Nginx |
cd backend
python -m pytest tests/ -v296 tests covering: review pipeline, cross-document indexer, document selector, citation normalization, smart router, ingestion pipeline, consistency checker, refusal detection, temporal guard, facility normalization, cost tracking, and adversarial edge cases.
CI runs automatically on every push via GitHub Actions.
See CONTRIBUTING.md for setup instructions, test requirements, and PR guidelines.
Business Source License 1.1 — free for non-competitive use; converts to Apache 2.0 on 2030-03-22. See LICENSE for full terms.
Built by Ralph Martello & Elle.