SourceMind

SourceMind turns 1,000-page document sets into a question-answerable knowledge base with page-level citations — then augments answers with live PubMed, clinical trial, and legal research.

Built on Recursive Language Models (RLMs), not basic RAG. The RLM writes and executes Python code to navigate documents — handling inputs 2 orders of magnitude beyond typical context windows.

Key Capabilities

Upload documents up to 1,000+ pages (PDF, TXT, MD, DOCX)
Ask natural language questions across one or many documents
Run medical-legal reviews — automated 5-step pipeline with per-facility map-reduce
Search PubMed (36M+ citations), ClinicalTrials.gov, and case law (Midpage) mid-answer
Get precise answers with [Source: filename, Page N] citations — click to view the original passage
Track API costs per query in real time

Use Cases

Med-legal expert witness: Full standard-of-care review across 1,000+ pages from multiple facilities
Clinical research: Synthesize findings across uploaded docs + PubMed literature
Clinical trial matching: Upload patient records, auto-extract diagnosis, find eligible trials
Legal research: Search case law and analyze judicial opinions alongside medical records
General knowledge work: Any professional who needs precision over large document sets

Screenshots

Notebook workspace — document panel with page counts, token totals, and conversation history

Query with inline citations — click any citation button to view the source passage

Citation viewer — original passage highlighted with page navigation

Architecture

flowchart TB
    subgraph Frontend["Frontend (React + TypeScript + Tailwind)"]
        UI[Notebook Workspace]
        DP[Document Panel]
        CP[Chat Panel]
        CV[Citation Viewer]
    end

    subgraph Backend["Backend (Python + FastAPI)"]
        ING[Document Ingestion<br/>PDF · TXT · MD · DOCX]
        CDI[Cross-Document Indexer<br/>classify · extract · timeline]
        SR[Smart Router]
        RLM[RLM Engine<br/>Root LM + Sub LM + REPL]
        RP[Review Pipeline<br/>5-step med-legal analysis]
        AH[Anti-Hallucination Stack<br/>refusal · consistency · temporal]
        CT[Citation Extractor + Normalizer]
    end

    subgraph External["External Research"]
        PM[PubMed<br/>36M+ citations]
        CTG[ClinicalTrials.gov]
        MP[Midpage Legal<br/>case law search]
    end

    UI --> ING
    UI --> SR
    SR -->|"< 150K tokens"| API[Direct Claude API]
    SR -->|">= 150K tokens"| RLM
    SR -->|"med-legal review"| RP
    RP --> RLM
    RLM --> PM & CTG & MP
    RLM --> CT
    RP --> AH
    CDI --> RP
    ING --> CDI
    CT --> CV

Smart Router

Context Size	Strategy	Why
< 150K tokens	Direct Claude API call	Faster, cheaper for shorter inputs
>= 150K tokens	RLM Engine (recursive navigation)	Agentic code-driven navigation across massive contexts
Medical-legal review	Review Pipeline (5-step)	Structured multi-pass analysis with per-facility map-reduce

RLM Engine

Root LM (Claude Sonnet 4) examines document structure via generated Python code
Sub LM (Claude Haiku 4.5) analyzes individual passages for semantic understanding
External tools available in the REPL: PubMed, ClinicalTrials.gov, Midpage legal research
Code execution filters, searches, and navigates — the model decides HOW to traverse
All claims require citations, verified against source text

Sample Queries

Input: "Did the patient meet the diagnostic criteria for incomplete Kawasaki disease at any facility visit?"

Output (excerpt): The patient presented with fever of 5 days duration and CRP of 8.2 mg/dL (Page 15, TGH_Record.pdf), meeting the entry criteria of CRP ≥3.0 mg/dL [FACT]. Of the 6 supplementary laboratory criteria, 3 were met: Albumin 2.8 g/dL ≤3.0 (Page 18), WBC 16,200/uL ≥15,000 (Page 15), and Urine WBC 12/HPF ≥10 (Page 22) [FACT]. The AHA guideline threshold of ≥3 of 6 criteria was reached [FACT]. CONFIDENCE: 8/10 (Citation: 9, Guideline: 9, Allegations: 7, Corroboration: 7)

Input: "Search PubMed for the latest meta-analyses on IV thrombolysis window extension beyond 4.5 hours"

Output (excerpt): A 2024 individual patient data meta-analysis (PMID: 38901847) pooling 6 RCTs (n=3,412) found that alteplase administered 4.5–9 hours after onset in patients selected by perfusion imaging had significantly better functional outcomes (mRS 0-1: OR 1.49, 95% CI 1.10–2.01) [PMID: 38901847].

Quick Start

Prerequisites

Python 3.11+
Node.js 18+
An Anthropic API key

1. Clone

git clone https://github.com/rdmgator12/SourceMind.git
cd SourceMind

2. Backend

cd backend
cp .env.example .env
# Edit .env and add your ANTHROPIC_API_KEY

pip install -r requirements.lock  # or requirements.txt for latest compatible
uvicorn app.main:app --reload

Backend runs at http://localhost:8000

3. Frontend

cd frontend
npm install
npm run dev

Frontend runs at http://localhost:3000 (proxies API calls to backend)

Docker (Production)

cp backend/.env.example backend/.env
# Edit backend/.env with your API key

docker-compose up --build

App available at http://localhost:3000

Configuration

Variable	Default	Description
`ANTHROPIC_API_KEY`	—	Your Anthropic API key (required)
`ROOT_MODEL`	`claude-sonnet-4-20250514`	Root LM for orchestration
`SUB_MODEL`	`claude-haiku-4-5-20251001`	Sub LM for recursive passage analysis
`RLM_ENVIRONMENT`	`local`	REPL sandbox (`local` or `docker`)
`RLM_MAX_RECURSION_DEPTH`	`3`	Max recursive depth per query
`RLM_TIMEOUT_SECONDS`	`900`	Max execution time per query
`RLM_MAX_BUDGET_USD`	`10.00`	Max API spend per query
`RLM_PER_STEP_BUDGET_USD`	`3.00`	Max API spend per pipeline step
`MAX_FILE_SIZE_MB`	`150`	Upload size limit
`ALLOWED_EXTENSIONS`	`pdf,txt,md,docx`	Accepted file types
`NCBI_API_KEY`	—	Optional NCBI key (increases PubMed rate limit to 10/sec)
`NCBI_EMAIL`	—	Optional email for NCBI API usage

API

POST   /api/notebooks                         Create notebook
GET    /api/notebooks                         List notebooks
GET    /api/notebooks/:id                     Get notebook
DELETE /api/notebooks/:id                     Delete notebook

POST   /api/notebooks/:id/documents           Upload document
GET    /api/notebooks/:id/documents           List documents
DELETE /api/notebooks/:id/documents/:did      Remove document
GET    /api/documents/:did/page/:page         Get page text

POST   /api/notebooks/:id/query              Submit query (multi-source)
POST   /api/notebooks/:id/review             Run medical-legal review
GET    /api/notebooks/:id/conversations       List conversations
GET    /api/conversations/:cid               Get conversation

GET    /api/stats                             Usage stats

WS     /ws/query/:notebook_id                 Streaming query via WebSocket
WS     /ws/review/:notebook_id               Streaming review via WebSocket

Interactive API docs (Swagger UI) available at http://localhost:8000/docs when running locally.

Tech Stack

Layer	Technology
Frontend	React 18, TypeScript, Tailwind CSS, Zustand
Backend	Python 3.12, FastAPI, SQLAlchemy, aiosqlite
RLM	`rlms` library (MIT), Anthropic backend
Document Parsing	PyMuPDF (PDF), python-docx (DOCX)
Literature Search	PubMed E-utilities (NCBI)
Trial Matching	ClinicalTrials.gov v2 API
Legal Research	Midpage Legal Research (MCP)
LLMs	Claude Sonnet 4 (root) + Haiku 4.5 (sub)
Deploy	Docker Compose, Nginx

Testing

cd backend
python -m pytest tests/ -v

296 tests covering: review pipeline, cross-document indexer, document selector, citation normalization, smart router, ingestion pipeline, consistency checker, refusal detection, temporal guard, facility normalization, cost tracking, and adversarial edge cases.

CI runs automatically on every push via GitHub Actions.

Contributing

See CONTRIBUTING.md for setup instructions, test requirements, and PR guidelines.

License

Business Source License 1.1 — free for non-competitive use; converts to Apache 2.0 on 2030-03-22. See LICENSE for full terms.

Built by Ralph Martello & Elle.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
backend		backend
demo_case		demo_case
docs		docs
frontend		frontend
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SourceMind

Key Capabilities

Use Cases

Screenshots

Architecture

Smart Router

RLM Engine

Sample Queries

Quick Start

Prerequisites

1. Clone

2. Backend

3. Frontend

Docker (Production)

Configuration

API

Tech Stack

Testing

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SourceMind

Key Capabilities

Use Cases

Screenshots

Architecture

Smart Router

RLM Engine

Sample Queries

Quick Start

Prerequisites

1. Clone

2. Backend

3. Frontend

Docker (Production)

Configuration

API

Tech Stack

Testing

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages