Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
version: 2
updates:
- package-ecosystem: "npm"
directory: "/"
schedule:
interval: "weekly"
day: "monday"
open-pull-requests-limit: 10
labels:
- "dependencies"
commit-message:
prefix: "deps"

- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
day: "monday"
labels:
- "ci"
commit-message:
prefix: "ci"
36 changes: 36 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [20, 22]

steps:
- uses: actions/checkout@v4

- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'

- name: Install dependencies
run: npm ci --legacy-peer-deps

- name: Type check
run: npx tsc --noEmit

- name: Run tests
run: npm test

- name: Security audit
run: npm audit --audit-level=high --omit=dev
continue-on-error: true
31 changes: 31 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: CodeQL

on:
push:
branches: [main]
pull_request:
branches: [main]
schedule:
- cron: '0 6 * * 1' # Weekly on Monday at 6am UTC

jobs:
analyze:
runs-on: ubuntu-latest
permissions:
security-events: write
actions: read
contents: read

steps:
- uses: actions/checkout@v4

- name: Initialize CodeQL
uses: github/codeql-action/init@v3
with:
languages: javascript-typescript

- name: Autobuild
uses: github/codeql-action/autobuild@v3

- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v3
35 changes: 35 additions & 0 deletions .github/workflows/scorecard.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: OpenSSF Scorecard

on:
push:
branches: [main]
schedule:
- cron: '0 6 * * 1' # Weekly on Monday at 6am UTC

permissions: read-all

jobs:
analysis:
runs-on: ubuntu-latest
permissions:
security-events: write
id-token: write
contents: read
actions: read

steps:
- uses: actions/checkout@v4
with:
persist-credentials: false

- name: Run Scorecard
uses: ossf/scorecard-action@v2.4.0
with:
results_file: results.sarif
results_format: sarif
publish_results: true

- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: results.sarif
15 changes: 8 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# CodeCortex

Persistent, AI-powered codebase knowledge layer. Pre-digests codebases into structured knowledge and serves to AI agents via MCP.
Codebase navigation and risk layer for AI agents. Pre-builds a map of architecture, dependencies, coupling, and risk areas so agents go straight to the right files.

## Stack
- TypeScript, ESM (`"type": "module"`)
- tree-sitter (native N-API) + 27 language grammar packages
- @modelcontextprotocol/sdk - MCP server (stdio transport)
- commander - CLI (init, serve, update, status)
- commander - CLI (init, serve, update, status, symbols, search, modules, hotspots, hook, upgrade)
- simple-git - git integration + temporal analysis
- zod - schema validation for LLM analysis results
- yaml - cortex.yaml manifest
Expand Down Expand Up @@ -49,11 +49,12 @@ Hybrid extraction:
- `codecortex hook install|uninstall|status` - manage git hooks for auto-update
- `codecortex upgrade` - check for and install latest version

## MCP Tools (15)
## MCP Tools (13)
Read (10): get_project_overview, get_module_context, get_session_briefing, search_knowledge, get_decision_history, get_dependency_graph, lookup_symbol, get_change_coupling, get_hotspots, get_edit_briefing
Write (5): analyze_module, save_module_analysis, record_decision, update_patterns, report_feedback
Write (3): record_decision, update_patterns, record_observation

All read tools include `_freshness` metadata (status, lastAnalyzed, filesChangedSince, changedFiles, message).
All read tools return context-safe responses (<10K chars) via truncation utilities in `src/utils/truncate.ts`.

## Pre-Publish Checklist
Run ALL of these before `npm publish`. Do not skip any step.
Expand All @@ -71,7 +72,7 @@ Run ALL of these before `npm publish`. Do not skip any step.
- **Grammar smoke test** (`parser.test.ts`): Loads every language in `LANGUAGE_LOADERS` via `parseSource()`. Catches missing packages, broken native builds, wrong require paths. This is what would have caught the tree-sitter-liquid issue.
- **Version-check tests**: Update notification, cache lifecycle, PM detection, upgrade commands.
- **Hook tests**: Git hook install/uninstall/status integration tests.
- **MCP tests**: All 15 tools (read + write), simulation tests.
- **MCP tests**: All 13 tools (read + write), simulation tests.

### Known limitations
- tree-sitter native bindings don't compile on Node 24 yet (upstream issue)
Expand All @@ -90,11 +91,11 @@ Run ALL of these before `npm publish`. Do not skip any step.
src/
cli/ - commander CLI (init, serve, update, status)
mcp/ - MCP server + tools
core/ - knowledge store (graph, modules, decisions, sessions, patterns, constitution, search)
core/ - knowledge store (graph, modules, decisions, sessions, patterns, constitution, search, agent-instructions, freshness)
extraction/ - tree-sitter native N-API (parser, symbols, imports, calls)
git/ - git diff, history, temporal analysis
types/ - TypeScript types + Zod schemas
utils/ - file I/O, YAML, markdown helpers
utils/ - file I/O, YAML, markdown helpers, truncation
```

## Temporal Analysis
Expand Down
130 changes: 80 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# CodeCortex

Persistent codebase knowledge layer for AI agents. Your AI shouldn't re-learn your codebase every session.
Codebase navigation and risk layer for AI agents. Pre-builds a map of architecture, dependencies, coupling, and risk areas so agents go straight to the right files.

> **⚠️ If you're on v0.4.3 or earlier, update now:** `npm install -g codecortex-ai@latest`
> v0.4.4 adds freshness flags on all MCP responses and `get_edit_briefing` — a pre-edit risk briefing tool.
[![CI](https://github.com/rushikeshmore/CodeCortex/actions/workflows/ci.yml/badge.svg)](https://github.com/rushikeshmore/CodeCortex/actions/workflows/ci.yml)
[![npm version](https://img.shields.io/npm/v/codecortex-ai)](https://www.npmjs.com/package/codecortex-ai)
[![npm downloads](https://img.shields.io/npm/dw/codecortex-ai)](https://www.npmjs.com/package/codecortex-ai)
[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/rushikeshmore/CodeCortex/badge)](https://scorecard.dev/viewer/?uri=github.com/rushikeshmore/CodeCortex)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/rushikeshmore/CodeCortex/blob/main/LICENSE)

[Website](https://codecortex-ai.vercel.app) · [npm](https://www.npmjs.com/package/codecortex-ai) · [GitHub](https://github.com/rushikeshmore/CodeCortex)

Expand All @@ -13,18 +16,40 @@ Persistent codebase knowledge layer for AI agents. Your AI shouldn't re-learn yo

## The Problem

Every AI coding session starts from scratch. When context compacts or a new session begins, the AI re-scans the entire codebase. Same files, same tokens, same wasted time. It's like hiring a new developer every session who has to re-learn everything before writing a single line.
Every AI coding session starts with exploration — grepping, reading wrong files, re-discovering architecture. On a 6,000-file codebase, an agent makes 37 tool calls and burns 79K tokens just to understand what's where. And it still can't tell you which files are dangerous to edit or which files secretly depend on each other.

**The data backs this up:**
- AI agents increase defect risk by 30% on unfamiliar code ([CodeScene + Lund University, 2025](https://codescene.com/hubfs/whitepapers/AI-Ready-Code-How-Code-Health-Determines-AI-Performance.pdf))
- Code churn grew 2.5x in the AI era ([GitClear, 211M lines analyzed](https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality))
- Nobody combines structural + semantic + temporal + decision knowledge in one portable tool

## The Solution

CodeCortex pre-digests codebases into layered knowledge files and serves them to any AI agent via MCP. Instead of re-understanding your codebase every session, the AI starts with knowledge.
CodeCortex gives agents a pre-built map: architecture, dependencies, risk areas, hidden coupling. The agent goes straight to the right files and starts working.

**Hybrid extraction:** tree-sitter native N-API for structure (symbols, imports, calls across 27 languages) + host LLM for semantics (what modules do, why they're built that way). Zero extra API keys.
**CodeCortex finds WHERE to look. Your agent still reads the code.**

Tested on a real 6,400-file codebase (143K symbols, 96 modules):

| | Without CodeCortex | With CodeCortex |
|--|:--:|:--:|
| Tool calls | 37 | **15** (2.5x fewer) |
| Total tokens | 79K | **43K** (~50% fewer) |
| Answer quality | 23/25 | **23/25** (same) |
| Hidden dependencies found | No | **Yes** |

### What makes it unique

Three capabilities no other tool provides:

1. **Temporal coupling** — Files that always change together but have zero imports between them. You can read every line and never discover this. Only git co-change analysis reveals it.

2. **Risk scores** — File X has been bug-fixed 7 times, has 6 hidden dependencies, and co-changes with 3 other files. Risk score: 35. You can't learn this from reading code.

3. **Cross-session memory** — Decisions, patterns, observations persist. The agent doesn't start from zero each session.

**Example from a real codebase:**
- `schema.help.ts` and `schema.labels.ts` co-changed in 12/14 commits (86%) with **zero imports between them**
- Without this knowledge, an AI editing one file would produce a bug 86% of the time

## Quick Start

Expand All @@ -38,17 +63,32 @@ npm install -g codecortex-ai --legacy-peer-deps
cd /path/to/your-project
codecortex init

# Start MCP server (for AI agent access)
codecortex serve

# Check knowledge freshness
codecortex status
```

### Connect to Claude Code

Add to your MCP config:
**CLI (recommended):**
```bash
claude mcp add codecortex -- codecortex serve
```

**Or add to MCP config manually:**
```json
{
"mcpServers": {
"codecortex": {
"command": "codecortex",
"args": ["serve"],
"cwd": "/path/to/your-project"
}
}
}
```

### Connect to Cursor
Add to `.cursor/mcp.json`:
```json
{
"mcpServers": {
Expand All @@ -73,7 +113,8 @@ All knowledge lives in `.codecortex/` as flat files in your repo:
graph.json # dependency graph (imports, calls, modules)
symbols.json # full symbol index (functions, classes, types...)
temporal.json # git coupling, hotspots, bug history
modules/*.md # per-module deep analysis
AGENT.md # tool usage guide for AI agents
modules/*.md # per-module structural analysis
decisions/*.md # architectural decision records
sessions/*.md # session change logs
patterns.md # coding patterns and conventions
Expand All @@ -85,47 +126,42 @@ All knowledge lives in `.codecortex/` as flat files in your repo:
|-------|------|------|
| 1. Structural | Modules, deps, symbols, entry points | `graph.json` + `symbols.json` |
| 2. Semantic | What each module does, data flow, gotchas | `modules/*.md` |
| 3. Temporal | Git behavioral fingerprint - coupling, hotspots, bug history | `temporal.json` |
| 3. Temporal | Git behavioral fingerprint coupling, hotspots, bug history | `temporal.json` |
| 4. Decisions | Why things are built this way | `decisions/*.md` |
| 5. Patterns | How code is written here | `patterns.md` |
| 6. Sessions | What changed between sessions | `sessions/*.md` |

### The Temporal Layer

This is the killer differentiator. The temporal layer tells agents *"if you touch file X, you MUST also touch file Y"* even when there's no import between them. This comes from git co-change analysis, not static code analysis.
## MCP Tools (13)

Example from a real codebase:
- `routes.ts` and `worker.ts` co-changed in 9/12 commits (75%) with **zero imports between them**
- Without this knowledge, an AI editing one file would produce a bug 75% of the time
### Navigation — "Where should I look?" (4 tools)

## MCP Tools (15)
| Tool | Description |
|------|-------------|
| `get_project_overview` | Architecture, modules, risk map. Call this first. |
| `search_knowledge` | Find where a function/class/type is DEFINED by name. Ranked results. |
| `lookup_symbol` | Precise symbol lookup with kind and file path filters. |
| `get_module_context` | Module files, deps, temporal signals. Zoom into a module. |

### Read Tools (10)
### Risk — "What could go wrong?" (4 tools)

| Tool | Description |
|------|-------------|
| `get_project_overview` | Constitution + overview + graph summary |
| `get_module_context` | Module doc by name, includes temporal signals |
| `get_session_briefing` | Changes since last session |
| `search_knowledge` | Keyword search across all knowledge |
| `get_decision_history` | Decision records filtered by topic |
| `get_dependency_graph` | Import/export graph, filterable |
| `lookup_symbol` | Symbol by name/file/kind |
| `get_change_coupling` | What files must I also edit if I touch X? |
| `get_hotspots` | Files ranked by risk (churn x coupling) |
| `get_edit_briefing` | **NEW** — Pre-edit risk briefing: co-change warnings, hidden deps, bug history, importers |
| `get_edit_briefing` | Pre-edit risk: co-change warnings, hidden deps, bug history. **Always call before editing.** |
| `get_hotspots` | Files ranked by risk (churn x coupling x bugs). |
| `get_change_coupling` | Files that must change together. Hidden dependencies flagged. |
| `get_dependency_graph` | Import/export graph filtered by module or file. |

All read tools include `_freshness` metadata indicating how up-to-date the knowledge is.

### Write Tools (5)
### Memory — "Remember this" (5 tools)

| Tool | Description |
|------|-------------|
| `analyze_module` | Returns source files + structured prompt for LLM analysis |
| `save_module_analysis` | Persists LLM analysis to `modules/*.md` |
| `record_decision` | Saves architectural decision to `decisions/*.md` |
| `update_patterns` | Merges coding pattern into `patterns.md` |
| `report_feedback` | Agent reports incorrect knowledge for next analysis |
| `get_session_briefing` | What changed since the last session. |
| `get_decision_history` | Why things were built this way. |
| `record_decision` | Save an architectural decision. |
| `update_patterns` | Document coding conventions. |
| `record_observation` | Record anything you learned about the codebase. |

All read tools include `_freshness` metadata and return context-safe responses (<10K chars) via size-adaptive caps.

## CLI Commands

Expand All @@ -136,25 +172,19 @@ All read tools include `_freshness` metadata indicating how up-to-date the knowl
| `codecortex update` | Re-extract changed files, update affected modules |
| `codecortex status` | Show knowledge freshness, stale modules, symbol counts |
| `codecortex symbols [query]` | Browse and filter the symbol index |
| `codecortex search <query>` | Search across all CodeCortex knowledge files |
| `codecortex search <query>` | Search across symbols, file paths, and docs |
| `codecortex modules [name]` | List modules or deep-dive into a specific module |
| `codecortex hotspots` | Show files ranked by risk: churn + coupling + bug history |
| `codecortex hook install\|uninstall\|status` | Manage git hooks for auto-updating knowledge |
| `codecortex upgrade` | Check for and install the latest version |

## Token Efficiency
## How It Works

CodeCortex uses a three-tier memory model to minimize token usage:

```
Session start (HOT only): ~4,300 tokens
Working on a module (+WARM): ~5,000 tokens
Need coding patterns (+COLD): ~5,900 tokens
**Hybrid extraction:** tree-sitter native N-API for structure (symbols, imports, calls across 27 languages) + host LLM for semantics (what modules do, why they're built that way). Zero extra API keys.

vs. raw scan of entire codebase: ~37,800 tokens
```
**Git hooks** keep knowledge fresh — `codecortex update` runs automatically on every commit, re-extracting changed files and updating temporal analysis.

85-90% token reduction. 7-10x efficiency gain.
**Size-adaptive responses** — CodeCortex classifies your project (micro → extra-large) and adjusts response caps accordingly. A 23-file project gets full detail. A 6,400-file project gets intelligent summaries. Every MCP tool response stays under 10K chars.

## Supported Languages (27)

Expand Down
Loading
Loading