Autonomous AI coding assistant for the terminal — powered by local LLMs on Apple Silicon.
Getting Started · Commands · Tool System · Project Creation · Configuration
UltraAgent is a terminal-based AI coding assistant that operates autonomously using an integrated tool system. It reads, writes, and edits files, runs shell commands, searches codebases, manages git, spawns sub-agents, persists memory, and browses the web — all controlled by a configurable three-tier permission system.
Runs 100% local on your machine — no API keys, no cloud, no data leaves your computer. Supports Ollama, LM Studio, MLX-LM, and llama.cpp.
- 100% local & private — no API keys, no cloud, all data stays on your machine
- Auto-Setup — detects your hardware, backends, and recommends optimal model + settings
- 29 built-in tools — file I/O, shell, git, sub-agents, memory, web, planning
- Project creation wizard — scaffold production-ready projects from scratch
- Project scanner — deep analysis auto-injected into every LLM request
- Three-tier permissions — safe / confirm / dangerous, user approves before risky ops
- Agentic loop — LLM autonomously selects tools, executes, observes, iterates (up to 30 rounds)
- Sub-agents — spawn parallel agents for concurrent tasks
- Persistent memory — context that survives across sessions
- Apple Silicon optimized — curated model recommendations for M-series chips
- Smart caching — LM Studio KV-Cache, Flash Attention, quantized KV for optimal performance
- Real-time streaming — token-by-token output with repetition detection
npm install -g ultraagent
ultraagent local setup # auto-detects hardware, backend, sets optimal config
ultraagent chat # start codinggit clone https://github.com/zurd46/UltraAgent.git
cd UltraAgent
npm install
npm run dev -- local setup # auto-detects hardware, backend, sets optimal config
npm run dev -- chat # start codingThe local setup command automatically:
- Detects your chip, RAM, GPU cores
- Finds running backends (LM Studio, Ollama, MLX-LM, llama.cpp)
- Recommends the best model for your hardware
- Sets optimal context length, KV-cache, batch size, flash attention
- Verifies the connection
npm run dev -- config set provider local-openai
npm run dev -- config set localModel qwen2.5-coder:14b
npm run dev -- config set localBaseUrl http://localhost:1234/v1
npm run dev -- chatnpm run build && npm install -g .
ultraagent local setup
ultraagent chatAlias:
uaworks everywhere instead ofultraagent.
| Backend | Provider | Default Port | Description |
|---|---|---|---|
| LM Studio | local-openai |
localhost:1234 |
Desktop app with Metal GPU, automatic prompt caching |
| Ollama | ollama |
localhost:11434 |
Simplest setup, native Metal GPU support |
| MLX-LM | local-openai |
localhost:8080 |
Apple's ML Framework, fastest on M-chips |
| llama.cpp | local-openai |
localhost:8080 |
Maximum control, OpenAI-compatible API |
| RAM | Model | Size | Quality | Speed |
|---|---|---|---|---|
| 64GB+ | Llama 3.3 70B (Q4) | 40 GB | Excellent | Slow |
| 32GB | Qwen 2.5 Coder 32B (Q4) | 18 GB | Excellent | Medium |
| 32GB | Codestral 22B | 13 GB | Excellent | Medium |
| 16GB | Qwen 2.5 Coder 14B | 9 GB | Excellent | Fast |
| 16GB | DeepSeek Coder V2 16B | 9 GB | Excellent | Fast |
| 8GB | Llama 3.1 8B | 4.7 GB | Good | Fast |
| 4GB | Qwen 2.5 Coder 3B | 1.9 GB | Basic | Fast |
ultraagent models --ram 32 # show recommendations for your RAMWhen using LM Studio, set these in Settings > Server for best performance:
| Setting | Recommended | Why |
|---|---|---|
| Flash Attention | On (not Auto!) | Reduces memory for long contexts |
| KV Cache Quantization | Q8_0 (16-32GB) / F16 (64GB+) | Halves KV-cache memory, minimal quality loss |
| GPU Offload | Max (all layers) | Full Metal GPU acceleration |
| Prompt Caching | Auto | LM Studio caches automatically |
ultraagent local setupsets these values automatically based on your hardware.
ultraagent <command> [options]
| Command | Description |
|---|---|
new |
Interactive wizard to scaffold a complete new project |
scan |
Deep-scan the project > docs/scan.md (auto-injected into LLM context) |
create |
Create a project from template |
analyze |
Analyze an existing project's structure and codebase |
| Command | Description |
|---|---|
chat |
Interactive session with full tool access |
run <prompt> |
Execute a single prompt non-interactively |
edit |
Edit or refactor existing code |
plan <prompt> |
Generate a structured project plan |
code <prompt> |
Generate code from a prompt |
| Command | Description |
|---|---|
config show |
Display current configuration |
config set <key> <value> |
Update a configuration value |
config setup |
Setup wizard (redirects to local setup) |
config reset |
Reset to defaults |
models |
Recommended local models for your hardware |
local setup |
Auto-detect hardware & set optimal config |
local status |
Check local LLM server health + current settings |
local pull [model] |
Download a model via Ollama |
local list |
List installed Ollama models |
UltraAgent ships with 29 tools that the LLM invokes autonomously during a session. Each tool has an assigned permission level.
| Level | Behavior |
|---|---|
| Safe | Executes immediately — no confirmation needed |
| Confirm | Requires user approval (file writes, commits) |
| Dangerous | Requires approval with highlighted warning (shell commands) |
When prompted: [y]es · [n]o · [a]lways allow · [d]eny always
Core File Tools
| Tool | Permission | Description |
|---|---|---|
read_file |
Safe | Read file contents with line numbers, offset, limit |
write_file |
Confirm | Create or overwrite files; auto-creates directories |
edit_file |
Confirm | Precise string replacement via diff-based editing |
bash |
Dangerous | Execute shell commands with timeout and output limits |
glob |
Safe | Find files by glob pattern |
grep |
Safe | Search file contents with regex and context lines |
Git Tools
| Tool | Permission | Description |
|---|---|---|
git_status |
Safe | Working tree status |
git_diff |
Safe | Staged and unstaged changes |
git_log |
Safe | Recent commit history |
git_commit |
Confirm | Create a commit |
git_branch |
Confirm | Create, switch, or list branches |
git_merge |
Dangerous | Merge branches |
git_revert |
Confirm | Revert a commit |
git_stash |
Confirm | Stash/pop changes |
git_cherry_pick |
Confirm | Cherry-pick commits |
git_rebase |
Dangerous | Rebase branches |
GitHub Tools
| Tool | Permission | Description |
|---|---|---|
gh_pr |
Confirm | Create, list, merge, review Pull Requests (requires gh CLI) |
gh_issue |
Confirm | Create, list, close, comment on Issues (requires gh CLI) |
Advanced Tools
| Tool | Permission | Description |
|---|---|---|
sub_agent |
Confirm | Spawn a sub-agent for independent tasks |
sub_agent_batch |
Confirm | Run multiple sub-agents in true parallel |
sub_agent_status |
Safe | Check status and results of running sub-agents |
memory_save |
Safe | Persist context to memory |
memory_search |
Safe | Search persistent memory |
memory_delete |
Safe | Delete a memory entry |
web_search |
Safe | Search the web |
web_fetch |
Safe | Fetch content from a URL |
plan_create |
Safe | Create a structured task plan |
plan_status |
Safe | Show plan progress |
plan_update |
Safe | Update task status |
ultraagent newInteractive wizard that scaffolds complete, production-ready projects.
# Non-interactive
ultraagent new --name my-app --type fullstack --stack nextjs-prisma-pg --prompt "E-commerce platform"| Option | Description |
|---|---|
-n, --name <name> |
Project name |
-t, --type <type> |
Project type |
-s, --stack <stack> |
Tech stack |
-d, --dir <path> |
Parent directory |
-p, --prompt <prompt> |
Project description |
Supported Project Types & Stacks
| Type | Stacks |
|---|---|
| webapp | React+Vite, Next.js, Vue 3+Vite, Svelte, Astro |
| fullstack | Next.js+Prisma+PG, React+Express, Vue+FastAPI, T3 Stack |
| api | Express, Fastify, NestJS, FastAPI, Hono, Go+Gin, Rust+Actix |
| cli | TypeScript+Commander, Python+Click, Rust+Clap, Go+Cobra |
| library | TypeScript npm, Python PyPI, Rust Crate |
| mobile | React Native+Expo, React Native Bare |
| desktop | Electron+React, Tauri+React |
| monorepo | Turborepo, Nx, pnpm Workspaces |
| ai-ml | Python+LangChain, Python+PyTorch, TypeScript+LangChain |
| custom | Describe what you need |
Optional Features
- Docker + docker-compose
- CI/CD (GitHub Actions)
- Testing (Unit + Integration)
- Linting + Formatting (ESLint / Prettier)
- Authentication
- Database setup
- API Documentation (OpenAPI / Swagger)
- Environment variables (.env)
- Logging
- Error handling / Monitoring
A complete project with directory structure, config files, typed source code, test setup, README, .gitignore, build scripts, initialized git repo, installed dependencies, and auto-generated docs/scan.md.
ultraagent scan # scan current project
ultraagent scan --force # force re-scan
ultraagent scan --dir /path/to/projectCreates docs/scan.md — a comprehensive project analysis that is automatically injected into every LLM request, giving the model full project context.
What scan.md contains
| Section | Content |
|---|---|
| Overview | Name, type, language, framework, package manager, file count, LOC |
| Git | Branch, remote, last commit |
| Directory Structure | Full tree (4 levels) |
| Key Files | Entry points with roles |
| Dependencies | All deps with versions |
| Scripts | npm scripts with commands |
| Configuration | tsconfig, eslint, prettier, docker, CI/CD, etc. |
| Test Files | All test file paths |
| API Routes | Detected route files |
| Environment Variables | From .env.example |
| Lines of Code | Total and per extension |
| Architecture | AI-generated analysis of patterns and conventions |
Cache:
scan.mdis valid for 24 hours. Use--forceto regenerate.
| Mode | Purpose |
|---|---|
chat |
General assistance — full tool access, memory, planning |
create |
Project setup — structure, dependencies, config |
analyze |
Code review — architecture, patterns, quality |
edit |
Refactoring — precise edits, minimal changes |
plan |
Task planning — phases, dependencies, risks |
code |
Code generation — typed, tested, convention-aware |
Switch modes in-session with /mode <mode>.
| Command | Description |
|---|---|
/help |
Available commands |
/new |
Create a new project |
/scan |
Scan project > docs/scan.md |
/mode <mode> |
Switch agent mode |
/dir <path> |
Change working directory |
/clear |
Clear terminal |
/status |
Session status and config |
/history |
Conversation history |
/undo |
Undo last file change |
/tokens |
Token usage |
/plan |
Current plan progress |
/exit |
End session |
Automate actions before or after tool calls. Config lives in .ultraagent/hooks.json.
{
"hooks": [
{
"timing": "after",
"tool": "write_file",
"command": "npx prettier --write ${file}",
"enabled": true
},
{
"timing": "after",
"tool": "edit_file",
"command": "npm test",
"enabled": true
}
]
}| Field | Type | Description |
|---|---|---|
timing |
before | after |
When to run |
tool |
string |
Tool name (* for all) |
command |
string |
Shell command |
blocking |
boolean |
Abort tool call on hook failure (before hooks only) |
enabled |
boolean |
Active state |
Variables: ${file} (file path), ${tool} (tool name)
UltraAgent automatically enriches every LLM request with project context:
| Source | Description |
|---|---|
| Project detection | Language, framework, package manager, scripts |
docs/scan.md |
Full project scan |
| Persistent memory | Saved context from prior sessions |
| Active plan | Task plan with status |
ULTRAAGENT.md |
Project-specific custom instructions |
| Git status | Branch, changes, recent commits |
Priority: Mode prompt > Project context > scan.md > Memories > Plan > History
Create ULTRAAGENT.md in your project root:
# Project Instructions
- TypeScript monorepo using pnpm
- Run `pnpm test` after changes
- Use conventional commitsGlobal instructions: ~/.ultraagent/instructions.md
ultraagent local setup # auto-detect & configure (recommended)
ultraagent config set provider local-openai
ultraagent config set localModel qwen2.5-coder:14b
ultraagent config show # view config
ultraagent config reset # reset to defaultsULTRAAGENT_PROVIDER=local-openai
ULTRAAGENT_LOCAL_BASE_URL=http://localhost:1234/v1
ULTRAAGENT_LOCAL_MODEL=qwen2.5-coder:14b
ULTRAAGENT_LOCAL_CONTEXT_LENGTH=32768
ULTRAAGENT_LOCAL_TEMPERATURE=0.7
ULTRAAGENT_LOCAL_GPU_LAYERS=-1
ULTRAAGENT_LOCAL_BATCH_SIZE=512
ULTRAAGENT_LOCAL_FLASH_ATTENTION=true
ULTRAAGENT_LOCAL_KV_CACHE_TYPE=q8_0All Config Options
| Option | Default | Description |
|---|---|---|
model |
qwen2.5-coder:14b |
Model identifier |
provider |
local-openai |
ollama or local-openai |
maxTokens |
8192 |
Max output tokens per response |
streaming |
true |
Real-time token streaming |
theme |
default |
default · minimal · verbose |
history |
true |
Keep conversation history |
maxSubAgents |
5 |
Max concurrent sub-agents |
localBaseUrl |
(auto) | Local LLM server URL (auto-detected per provider) |
localModel |
— | Local model name |
localGpuLayers |
-1 |
GPU layers (-1 = all) |
localContextLength |
32768 |
Context window size |
localTemperature |
0.7 |
Sampling temperature |
localBatchSize |
512 |
Inference batch size |
localFlashAttention |
true |
Enable flash attention |
localKvCacheType |
q8_0 |
KV-cache quantization (f16, q8_0, q4_0) |
src/
├── cli.ts # CLI entry point (Commander.js)
├── index.ts # Programmatic API exports
├── commands/ # Command implementations
│ ├── new.ts # Project creation wizard
│ ├── scan.ts # Project scanner
│ ├── chat.ts # Interactive session
│ ├── edit.ts # Code editing
│ ├── run.ts # Single prompt execution
│ ├── config-cmd.ts # Configuration management
│ ├── local.ts # Local LLM auto-setup
│ └── models.ts # Model recommendations
├── core/ # Core engine
│ ├── agent-factory.ts # Agent loop, streaming, context injection
│ ├── session.ts # Interactive REPL with slash commands
│ ├── context.ts # Git context & project instructions
│ ├── context-manager.ts # Token-aware history management
│ ├── local-llm.ts # Local LLM provider adapters
│ ├── planner.ts # Plan creation & task tracking
│ ├── hooks.ts # Pre/post tool execution hooks
│ ├── token-tracker.ts # Usage tracking
│ └── undo.ts # File change rollback
├── tools/ # Tool system (29 tools)
│ ├── base.ts # Registry, permission levels, definitions
│ ├── permissions.ts # Permission manager & session allowlist
│ ├── read.ts / write.ts / edit.ts / bash.ts / glob.ts / grep.ts
│ ├── git.ts # 10 git tools + 2 GitHub tools (gh_pr, gh_issue)
│ ├── sub-agent.ts # Parallel sub-agent execution
│ ├── memory.ts # memory_save, memory_search, memory_delete
│ ├── web.ts # web_search, web_fetch
│ └── index.ts # Barrel exports
├── ui/ # Terminal UI
│ ├── theme.ts # Colors, gradients, ASCII banner
│ ├── spinner.ts # Loading spinners
│ └── permission-prompt.ts # Interactive permission dialog
└── utils/
├── config.ts # Zod-validated config with env support
├── hardware.ts # Apple Silicon hardware detection
├── backend-detect.ts # LLM backend auto-detection
├── model-recommender.ts # RAM-based model recommendation engine
└── files.ts # Project detection & analysis
| Category | Technology |
|---|---|
| Language | TypeScript 5.7 (ES Modules) |
| AI | LangChain · LangGraph |
| LLM Backends | Ollama · LM Studio · MLX-LM · llama.cpp |
| CLI | Commander.js · Inquirer.js |
| Validation | Zod |
| UI | Chalk · Ora · Boxen · Gradient-string · cli-table3 |
| Testing | Vitest |
npm run build # compile TypeScript -> dist/
npm run dev # run via tsx (no build)
npm start # run compiled output
npm test # run tests
npm run test:watch # tests in watch mode
npm run clean # remove dist/Contributions are welcome. Please open an issue first to discuss what you'd like to change.
- Fork the repository
- Create your branch (
git checkout -b feature/my-feature) - Commit your changes
- Push to the branch
- Open a Pull Request
- Smart context management — automatically reduces tools and prompt size to fit within model's context window
- Improved output formatting — cleaner responses with structured formatting rules (status icons, tables, tree output)
- 11 new tools — git_merge, git_revert, git_stash, git_cherry_pick, git_rebase, gh_pr, gh_issue, sub_agent_batch, sub_agent_status, plan_create/status/update
- Context overflow detection — clear error message with actionable solutions when context window is too small
- Tool token estimation — dynamic tool binding based on available context budget
- Initial release with 18 tools, agentic loop, sub-agents, memory, project scanner, auto-setup