Skip to content

MichTronics/APE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

APE - AI Project Engine

Autonomous software engineering — powered by OpenAI · Claude · OpenCode

Node.js OpenAI Anthropic OpenCode License

APE plans, reviews, patches, builds, and commits your code — while you stay in control. Every change is risk-classified and reviewed by the appropriate AI pipeline before a single byte hits disk. In debate mode a full Debate Viewer CLI UI renders each phase in real time — color-coded, structured, and engineer-friendly. Works with any language, any stack, any build system.


What is APE?

APE is a fully autonomous AI coding agent for any software project — Python services, TypeScript apps, Rust CLIs, Go microservices, Java backends, C/C++ firmware, and everything in between. You give it a goal in plain English; it figures out the task plan, generates minimal unified-diff patches, runs your build system, and iterates on failures — all with human approval gates at every critical step.

Version 2.0 introduces Risk-Gated Debate Mode: every proposed change is automatically routed to the right review pipeline based on a weighted risk classifier. Low-risk changes get a fast single-pass review; high-risk or safety-critical changes (data models, auth flows, transaction handlers, critical system code) go through a 4-phase adversarial AI debate before anything touches your codebase.


Features

Feature Detail
🧠 Multi-provider AI OpenAI, Claude, and OpenCode — mix and match any model as proposer or critic
🎯 Risk-gated mode selection Automatic LITE vs DEBATE routing per change-set
⚔️ 4-phase adversarial debate Propose → Challenge → Rebuttal → Final audit
🆓 Free model support OpenCode Zen API: glm-5-free, minimax-m2.5-free, kimi-k2.5-free, big-pickle
Debate Viewer CLI UI Structured, color-coded terminal panels for every debate phase
🔥 Risk heatmap ASCII per-file risk bars rendered after each debate
🗂️ Debate session logs Auto-persists _session.json, _patch_v1.diff, _patch_v2.diff
🔒 Firmware-safe guardrails Blocks struct renames, ISR changes, oversized deletions, protected paths — web/script files exempt from deletion limits
🩹 Unified diff patches All changes via git apply — no full-file overwrites
🔨 Build loop Run your build after every patch; auto-fix on failure (up to 3 retries)
💰 Budget tracking Per-provider (OpenAI / Claude) + per-phase USD and token accounting
🧠 Persistent memory Architecture decisions, constraints, and errors survive sessions
↩️ Resume sessions Pick up exactly where you left off with --resume
🛑 Human approval gates Y/N prompts before every apply and commit — always
🌵 Dry-run by default Nothing written to disk unless you explicitly pass --apply
🔍 Verbose debug mode --verbose dumps full raw model JSON for every phase

Quick Start

# Install
cd ape && npm install
cp .env.example .env   # add OPENAI_API_KEY and ANTHROPIC_API_KEY

# Preview what APE would do (safe, no writes)
node index.js \
  --goal="Add rate limiting middleware to the Express API" \
  --type=node \
  --build="npm test"

# Actually apply patches
node index.js \
  --goal="Add rate limiting middleware to the Express API" \
  --type=node \
  --build="npm test" \
  --apply

Risk-Gated Debate Mode

The centrepiece of APE v2.0. Every task is classified before any AI call is made:

  Change set
      │
      ▼
 ┌──────────────────────────────┐
 │  Risk Classifier  (0-100)    │
 │  • struct / enum  →  +40     │
 │  • ISR / IRAM_ATTR → +30     │
 │  • memory ops     →  +15     │
 │  • concurrency    →  +15     │
 │  • protected path →  +25     │
 └──────────┬───────────────────┘
            │
  ┌─────────▼──────────┐
  │   Mode Selector    │◄── --lite-only / --debate-only
  └─────────┬──────────┘
            │
   ┌────────┴────────┐
   │                 │
   ▼                 ▼
LITE mode       DEBATE mode
(score < 55)    (score ≥ 55)
1 provider      4 phases:
call            Phase 1 — Proposer generates patch
~$0.01-0.03     Phase 2 — Critic challenges
                Phase 3 — Proposer rebuts
                Phase 4 — Critic final audit
                ~$0.00-0.25 (free w/ OpenCode)

Force-debate conditions override score entirely:

  • Any struct / typedef / enum keyword in the diff
  • ISR or IRAM_ATTR in the diff
  • Protected path (src/protocol, src/radio, src/routing) + patch > 80 lines

→ Full documentation: docs/risk-gated-debate.md


Debate Viewer CLI UI

When running in debate mode, APE renders a full structured terminal UI as each phase completes:

══════════════════════════════════════════════════════════════
  AI DEBATE SESSION  [Task 3]
──────────────────────────────────────────────────────────────
  Mode:        debate
  Risk Level:  HIGH
  Risk Score:  72
  Triggers:
    · struct keyword detected
    · TTL logic modified
══════════════════════════════════════════════════════════════

[PHASE 1] GPT Proposal
──────────────────────────────────────────────────────────────
  Files:       src/mesh_rx.c, src/routing.c
  Patch lines: 184
  Self risk:   medium
  Confidence:  78%

[PHASE 2] Critic Challenge
──────────────────────────────────────────────────────────────
  ⚠ mesh_rx.c:142-168
      Issue:    Possible race condition on shared buffer
      Severity: CRITICAL
  ⚠ packet.h:33-48
      Issue:    Enum order modified (protocol risk)
      Severity: MEDIUM

[PHASE 3] Proposer Defense
──────────────────────────────────────────────────────────────
  ✔ Reverted enum reorder
  ✔ Added boundary guard for TTL decrement
  ✔ Wrapped shared buffer access in mutex
--- PATCH CHANGES (v1 → v2) ---
  Lines removed: 12    Lines added: 18

[PHASE 4] Final Audit
──────────────────────────────────────────────────────────────
  Remaining issues: none
  Final Risk:  LOW
  Confidence:  84%

══════════════════════════════════════════════════════════════
  FINAL DECISION
──────────────────────────────────────────────────────────────
  Mode used:         DEBATE
  Allow Apply:       YES
  Allow Commit:      NO (requires human)
  Final Confidence:  82%
══════════════════════════════════════════════════════════════

Risk Heatmap:
  mesh_rx.c    ████████░░  70%
  routing.c    ███░░░░░░░  30%
  packet.h     ██████████  90%

Apply patch? (y/n)

Color coding: RED = critical, YELLOW = medium/high, GREEN = safe/low. Enable --verbose to print full raw model JSON after each phase.

Session artifacts persisted to <target>/.ape/sessions/:

.ape/sessions/
  <ts>_session.json     full 4-phase debate record
  <ts>_patch_v1.diff    original proposer patch
  <ts>_patch_v2.diff    revised patch after defense

CLI Flags

Core

Flag Default Description
--goal required What to build or fix
--type node See Project Types table below. 23 types supported.
--build (none) Build command — e.g. npm test, cargo test, pytest, make, dotnet build
--target cwd Path to your project directory
--max-budget 5.00 USD spending cap
--max-tokens 500000 Total token cap
--resume false Resume from ape-state.json
--no-git false Skip all git operations

Review Mode

Flag Default Description
--lite-only false Force single-pass LITE review for all tasks
--debate-only false Force 4-phase DEBATE review for all tasks

Patch Application

Flag Default Description
--apply false Write patches to disk. Without this, APE is in dry-run mode

Safety

Flag Default Description
--allow-protected false Allow changes to protected paths (e.g. src/protocol, src/radio, src/routing)
--allow-isr false Allow patches touching ISR / IRAM_ATTR code
--confidence-threshold 70 Minimum AI confidence score (0-100) to allow apply
--auto-commit false Auto-propose commit after each task (human still approves)

Debug

Flag Default Description
--verbose false Print full raw model JSON responses after each debate phase

→ Full reference: docs/cli-reference.md


Project Types

Pass any of the following to --type. Each type sets the planner prompt, build conventions, and guardrail rules appropriate for that stack.

Type Stack Build command hint
embedded C/C++ firmware (ESP-IDF / Arduino / bare-metal) idf.py build / pio run
cli CLI tool in any language (Python, Go, Rust, Node, C…) (varies by language)
node Node.js (Express / general) npm test
web Generic browser front-end (HTML + JS) (none)
htmlcss Pure HTML + CSS, no build tool (none)
python Python 3 scripts / libraries pytest
react React 18 + Vite SPA npm test
api Node/Express REST API npm test
rust Rust (Cargo 2021) cargo test
docker Dockerfiles + Compose only docker build .
arduino Arduino / PlatformIO sketches pio run
nextjs Next.js 14 App Router + Tailwind npm test
go Go modules (go.mod) go test ./...
fastapi FastAPI + Pydantic v2 pytest
bash Bash scripts shellcheck
svelte SvelteKit + TypeScript npm test
tauri Tauri (Rust backend + web front-end) cargo test
vscode-ext VS Code extension npm test
terraform Terraform / HCL2 terraform validate
platformio PlatformIO (embedded) pio test
dotnet .NET 8 / C# dotnet test
cpp C++17/20 with CMake cmake --build build
c C11 with Makefile / CMake make

How It Works

 1. PLAN        Proposer model generates architecture + task list
                Critic model reviews and refines the plan
                                │
 2. FOR EACH TASK:              │
    ╔══════════════════════════╗│
    ║ Pre-guardrail            ║│
    ║ Protected path check     ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Risk Classifier          ║│
    ║ Score 0-100, detect ISR  ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Mode Selector            ║│
    ║ LITE or DEBATE           ║│
    ╚══════════╤═══════════════╝│
               │                │
    ┌──────────▼──────────┐     │
    │ Review Pipeline     │     │
    │ (LITE or DEBATE)    │     │
    └──────────┬──────────┘     │
               │                │
    ╔══════════▼═══════════════╗│
    ║ Post-guardrail           ║│
    ║ checkPatch()             ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Consensus                ║│
    ║ allow_apply?             ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Dry-run gate             ║│  ← default: stop here
    ║ --apply required         ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Human Y/N                ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ git apply                ║│
    ║ hard fail if rejected    ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Build + fix loop         ║│
    ║ up to 3 retries          ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Human Y/N commit         ║│
    ╚══════════╤═══════════════╝│
               │                │
    ╔══════════▼═══════════════╗│
    ║ Save report artifact     ║│
    ╚══════════════════════════╝│
                                │
 3. SUMMARY budget + phases ────┘

Architecture

ape/
├── index.js                        CLI entry — arg parsing, option assembly
└── src/
    │
    ├── orchestrator.js             Master loop: risk → mode → review → apply
    │
    ├── ── Planning ─────────────────────────────────────────────────────────
    ├── planner.js                  GPT creates task list for any project type; Claude refines
    ├── taskManager.js              Dependency-aware queue; done/failed/pending
    ├── memory.js                   Arch decisions, constraints, error history
    ├── stateTracker.js             Iteration counter, current task, budget snapshot
    │
    ├── ── Risk & Mode ──────────────────────────────────────────────────────
    ├── riskClassifier.js           Weighted 0-100 score; detects ISR/struct/protected
    ├── modeSelector.js             lite | debate; CLI flags override classifier
    │
    ├── ── Review Pipelines ─────────────────────────────────────────────────
    ├── liteReviewer.js             Single pass → unified diff (LITE mode)
    ├── debateOrchestrator.js       4-phase adversarial debate (DEBATE mode)
    ├── debateViewer.js             Debate Viewer CLI UI — panels, heatmap, prompts, log persist
    ├── critiqueParser.js           Parses/normalises all 4 phase JSON; safe fallbacks
    ├── rebuttalEngine.js           Phase 3 rebuttal; addressed_items tracking
    ├── consensus.js                fromLite / fromDebate → CONSENSUS_OUTPUT
    │
    ├── ── Patch Lifecycle ──────────────────────────────────────────────────
    ├── patchApplier.js             applyDiff / saveDiff / previewDiff
    ├── guardrails.js               checkPatch / checkPaths / checkFiles (file-type-aware)
    ├── patchGenerator.js           Legacy helper; used for record saving
    │
    ├── ── Build & Git ──────────────────────────────────────────────────────
    ├── buildRunner.js              Run build command; extract errors
    ├── gitManager.js               Branch, stage, commit, awaitApproval
    │
    ├── ── AI Providers ─────────────────────────────────────────────────────
    ├── openai.js                   OpenAIProvider + legacy completeJSON helpers
    ├── claude.js                   ClaudeProvider + legacy completeJSON helpers
    ├── providers/LLMProvider.js    Abstract base — generate(prompt, options)
    ├── providers/providerFactory.js createProvider('openai'|'claude'|'opencode')
    ├── providers/OpenCodeProvider.js fetch-based; free model allowlist; 4096 token default
    ├── core/DebateSession.js       Session state: proposer/critic providers + model names
    │
    └── ── Infrastructure ───────────────────────────────────────────────────
        ├── budgetManager.js        Per-model + per-phase USD + token tracking
        ├── logger.js               Coloured console output; modeDecision, riskScore…
        └── config.js               .env validation; throws on missing keys

Artifacts

Every run writes structured artifacts into your project:

<your-project>/
└── .ape/
    ├── patches/        <ts>_<taskId>.diff     every attempted patch (audit trail)
    ├── debates/        <ts>_<taskId>.json     full 4-phase debate records
    ├── sessions/       <ts>_session.json      debate viewer session log
    │                   <ts>_patch_v1.diff     original proposer patch
    │                   <ts>_patch_v2.diff     revised patch after defense (if changed)
    ├── memory.json     architecture decisions, constraints, error history
    └── state.json      iteration counter, task status, budget snapshot

Setup

# 1. Install dependencies
cd ape && npm install

# 2. Configure API keys
cp .env.example .env

Edit .env:

# Required for OpenAI / Claude (default providers)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Optional: choose which model fills each debate role
# PROPOSER_PROVIDER=openai    # openai | claude | opencode
# PROPOSER_MODEL=gpt-4.1
# CRITIC_PROVIDER=claude      # openai | claude | opencode
# CRITIC_MODEL=claude-opus-4-5

# Optional: use OpenCode free models (no API key needed for free tier)
# OPENCODE_ZEN_BASE_URL=https://opencode.ai
# PROPOSER_PROVIDER=opencode
# PROPOSER_MODEL=glm-5-free
# CRITIC_PROVIDER=opencode
# CRITIC_MODEL=kimi-k2.5-free
# 3. Verify
node index.js --help

AI Providers

APE uses a proposer / critic model: one AI proposes the patch, another challenges it. You can assign any supported provider to either role via .env.

Supported providers

Provider Key Models
OpenAI openai gpt-4.1, gpt-4o, any GPT model
Anthropic (Claude) claude claude-opus-4-5, any Claude model
OpenCode Zen API opencode glm-5-free, minimax-m2.5-free, kimi-k2.5-free, big-pickle (free)

How to configure

Set these four variables in your .env:

PROPOSER_PROVIDER=openai        # who generates the patch
PROPOSER_MODEL=gpt-4.1
CRITIC_PROVIDER=claude          # who challenges and audits it
CRITIC_MODEL=claude-opus-4-5

OpenCode (free, no API key needed)

OpenCode exposes an OpenAI-compatible /zen/v1/chat/completions endpoint. The free models are allowlisted by default — no billing setup required.

Step 1 — add to .env:

OPENCODE_ZEN_BASE_URL=https://opencode.ai
# OPENCODE_ZEN_API_KEY=          ← leave blank for free tier
OPENCODE_DEFAULT_MODEL=glm-5-free

Step 2 — choose a debate pairing:

# All-free debate (GLM proposes, Kimi critiques)
PROPOSER_PROVIDER=opencode
PROPOSER_MODEL=glm-5-free
CRITIC_PROVIDER=opencode
CRITIC_MODEL=kimi-k2.5-free

# Mixed: GPT proposes, OpenCode critiques for free
PROPOSER_PROVIDER=openai
PROPOSER_MODEL=gpt-4.1
CRITIC_PROVIDER=opencode
CRITIC_MODEL=minimax-m2.5-free

Step 3 — run APE normally:

node index.js \
  --goal="Add error handling to the data pipeline" \
  --type=python --build="pytest" --apply

Allowlist: by default only glm-5-free, minimax-m2.5-free, kimi-k2.5-free, and big-pickle are accepted. Set OPENCODE_ALLOW_ANY_MODEL=1 to bypass the check for other model strings.


Examples

# Safe preview — see the plan without touching any files
node index.js \
  --goal="Add input validation and error handling to the user registration endpoint" \
  --type=node

# FastAPI service
node index.js \
  --goal="Add JWT authentication to the FastAPI backend" \
  --type=fastapi \
  --build="pytest" \
  --apply --max-budget=5.00

# Force full adversarial debate on critical payment logic
node index.js \
  --goal="Refactor the payment transaction rollback handler" \
  --type=node \
  --debate-only --allow-protected \
  --build="npm test" \
  --apply --max-budget=10.00

# Force debate with verbose model JSON output
node index.js \
  --goal="Refactor routing layer" \
  --type=embedded \
  --debate-only --apply --verbose

# Rust CLI tool
node index.js \
  --goal="Add async file processing with progress bar" \
  --type=cli --target=./my-cli \
  --build="cargo test" \
  --apply --max-budget=3.00

# Resume an interrupted session
node index.js \
  --goal="Add JWT authentication to the FastAPI backend" \
  --type=web --build="pytest" \
  --resume --apply

# Zero-cost debate using OpenCode free models
node index.js \
  --goal="Refactor the data pipeline module" \
  --type=python --build="pytest" \
  --debate-only --apply
# (set PROPOSER_PROVIDER=opencode CRITIC_PROVIDER=opencode in .env first)

Documentation

Doc Description
docs/architecture.md Full data-flow diagrams, module map, session state, risk scoring table
docs/risk-gated-debate.md LITE mode, DEBATE mode (all 4 phases), debate viewer UI, consensus, budget fallback
docs/cli-reference.md Every CLI flag with defaults, types, and examples
docs/guardrails.md Pre/post guardrails, protected paths, deletion ratio, custom config
docs/modules.md Full public API for every module in src/

License

MIT License

Copyright (c) 2026 APE Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

APE is a risk-governed multi-LLM debate orchestration engine that enables structured model-vs-model reasoning, deterministic consensus scoring, and production-grade decision pipelines.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors