| Component | Language | Description | Key Features |
|---|---|---|---|
| DLSlime | C++ | RDMA communication | Zero-copy KV cache migration, P2P mesh networking, GPUDirect RDMA |
| NanoCtrl | Rust | Control plane | Redis-backed service registry, health monitoring, engine discovery, Python client |
| NanoDeploy | Python/C++ | LLM inference engine | Prefill/decode engines, KV cache management, continuous batching, Ray-based distributed workers |
| NanoDeployVL | Python | Vision-Language encoder | EP-separated ViT encoder, RDMA embedding transfer, Qwen3-VL support |
| NanoRoute | Rust | HTTP load balancer | OpenAI-compatible API, tool calls, routing strategies, engine discovery |
graph TB
Client[Client Layer<br/>HTTP Requests / OpenAI SDK]
Route[NanoRoute<br/>Rust/HTTP<br/>Load Balancer]
VL[NanoDeployVL<br/>Vision Encoder]
Prefill[Prefill Engine<br/>Python/C++]
Decode[Decode Engine<br/>Python/C++]
Ctrl[NanoCtrl<br/>Redis<br/>Service Registry]
Client -->|HTTP| Route
Route -->|ZMQ| VL
Route -->|ZMQ| Prefill
Route -->|ZMQ| Decode
VL -->|RDMA<br/>Embeddings| Prefill
Prefill -->|RDMA<br/>KV Migration| Decode
VL -->|Register/Heartbeat| Ctrl
Prefill -->|Register/Heartbeat| Ctrl
Decode -->|Register/Heartbeat| Ctrl
Route -->|Engine Discovery| Ctrl
The root pyproject.toml acts as a meta-package that lets you install any combination of Python components in a single command.
pip install ".[all]"pip install ".[dlslime]" # DLSlime transfer engine only
pip install ".[nanoctrl]" # NanoCtrl lifecycle client only
pip install ".[nanodeploy]" # NanoDeploy inference engine only
pip install ".[nanodeployvl]" # NanoDeployVL vision-language encoder only# Build NanoDeploy C++ extensions in-place
cd NanoDeploy && pip install -e . && cd ..
# Build NanoRoute (Rust)
cd NanoRoute && cargo build --release && cd ..
# Build NanoCtrl (Rust) + install Python client
cd NanoCtrl && cargo build --release && pip install -e . && cd ..Prefill-Decode disaggregation splits prompt processing (prefill) and token generation (decode) across separate GPU nodes connected via RDMA.
- 2 nodes with NVIDIA GPUs (SM90+ for FP8), RDMA-capable NICs
- Redis, Ray cluster, Rust toolchain
# Node 0 (head)
ray start --head --port=7078 --dashboard-host=0.0.0.0
# Node 1 (multi-node only)
ray start --address <node0-ip>:7078Batch generation without HTTP serving.
python NanoDeploy/examples/non_disagg.py \
--model /models/Qwen3-235B-A22B \
--ray_address <node0-ip>:7078 \
--master_address <node0-ip>:6006 \
--attention_dp 8 --ffn_ep 8 \
--kvcache_block_size 256 \
--prompt "1+1=?" --max_tokens 128redis-server --bind 0.0.0.0 --port 6379
cd NanoCtrl && cargo run --release # edit config.toml to set redis_urlpython NanoDeploy/examples/disagg.py \
--model /models/Qwen3-235B-A22B \
--ray_address <node0-ip>:7078 \
--nanoctrl_address <node0-ip>:3000 \
--attention_dp 8 --ffn_ep 8 \
--prefill.master_address <node0-ip>:6006 \
--decode.master_address <node1-ip>:6006ZMQ engine servers with OpenAI-compatible HTTP API via NanoRoute.
redis-server --bind 0.0.0.0 --port 6379
cd NanoCtrl && cargo run --release # edit config.toml to set redis_urlcd NanoRoute && cargo run --release # edit config.toml to set nanoctrl_address# Terminal 1 β Decode engine
python NanoDeploy/nanodeploy/server/engine_server.py \
--model /models/Qwen3-235B-A22B \
--mode decode \
--ray_address <node0-ip>:7078 \
--nanoctrl_address <node0-ip>:3000 \
--nanoctrl_scope nanoctrl-0 \
--master_address <node1-ip>:6006 \
--host <node0-ip> --port 6001 \
--attention_dp 8 --ffn_ep 8 \
--kvcache_block_size 64 \
--max_num_batched_tokens 16384 --max_model_len 16384
# Terminal 2 β Prefill engine
python NanoDeploy/nanodeploy/server/engine_server.py \
--model /models/Qwen3-235B-A22B \
--mode prefill \
--ray_address <node0-ip>:7078 \
--nanoctrl_address <node0-ip>:3000 \
--nanoctrl_scope nanoctrl-0 \
--master_address <node0-ip>:6006 \
--host <node0-ip> --port 6002 \
--attention_dp 8 --ffn_ep 8 \
--kvcache_block_size 64 \
--max_num_batched_tokens 16384 --max_model_len 16384curl http://<node0-ip>:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "/models/Qwen3-235B-A22B", "messages": [{"role": "user", "content": "Hello"}]}'See individual component license.
- Issues: GitHub Issues
- Documentation: Check component READMEs