English | 简体中文
⚠️ DEPRECATED — no longer maintained.This skill has been abandoned. Do not install it. The documentation below the divider is kept only for historical reference. New skills that explicitly replace this workflow are not planned.
The strict-atomic-round + durable-state pattern this skill implemented is no longer worth running as a separate skill. Both CLIs now have native features that cover the realistic use cases. Pick the path that matches your CLI.
-
Uninstall:
rm -rf "${CODEX_HOME:-$HOME/.codex}/skills/strict-agent-loop" -
For in-progress workspaces, the
.codex-loop/directories are now inert. Archive them if you still want the event history (tar czf codex-loop-archive.tgz .codex-loop/), otherwise delete them. -
If you still want strict atomic-step execution from Codex, use the built-in primitives rather than a skill:
- Write the atomic-step contract directly into your prompt (one bounded task per round, with one machine-checkable success condition).
- Use Codex's own session resumption instead of a custom disk-backed loop.
- If durable progress really matters, write a short append-only log in the workspace yourself — simpler than a skill and harder to drift.
This skill was never built for Claude Code, and is no longer recommended for either CLI. Nothing to uninstall.
For the same "strict atomic rounds with durable state" shape on Claude Code, use the built-in features:
- The task system (
TaskCreate/TaskUpdate/TaskList) tracks atomic rounds in-session without extra infrastructure. - Subagents under
.claude/agents/isolate each round in its own context. - Plan mode commits the agent to a bounded plan before execution.
- Hooks in
settings.json(SessionStart,PreToolUse,Stop) cover the "persist state across sessions" need without a separate runtime.
The sections below describe the original (now-deprecated) skill. They are preserved for auditing past usage. Do not install or run this skill.
- Codex often compresses long work into a vague summary and skips the middle.
- You want every round to have one bounded atomic task and one explicit local done condition.
- You want progress, announcements, and recovery state to survive context loss.
- You want unattended work to keep looping until a real stop condition is reached.
- You want disk-backed progress broadcasts so a long run does not look dead.
- You may have many different long-running loops in one repo, so the runtime needs task management and namespacing.
This skill does not add a magical infinite runtime to Codex. It implements a strict loop by combining:
- a controller protocol
- disk-backed state and append-only ledgers
- optional outer supervision for unattended runs
Two operating modes share the same task model:
interactive: the current Codex session is the controller and reports every round to the userunattended:scripts/supervise.pyowns the outer repetition and repeatedly runs or resumes Codex
The inner loop stays the same in both modes:
- read the authoritative task state from disk
- announce the next atomic round
- do exactly one small task
- verify it with evidence
- persist the verified round
- re-run machine-checkable stop rules
- refresh progress broadcasts and summaries
- continue unless the stop condition or a real blocker has been reached
One repo can host many strict loops at the same time. The default layout is manager-based:
<workspace-root>/
└── .codex-loop/
├── registry.json
└── tasks/
├── parser-fix/
│ ├── state.json
│ ├── events.jsonl
│ ├── iterations.jsonl
│ ├── status-history.jsonl
│ ├── latest-status.txt
│ ├── latest-stop-report.json
│ ├── run-summary.md
│ ├── rounds/
│ └── supervisor/
└── docs-cleanup/
└── ...
registry.json is the manager index.
Each task has its own durable state and logs under tasks/<task-id>/.
This is the main anti-conflict mechanism:
- a repo can have many loops
- each loop gets a stable
task-id - mutation scripts operate on one explicit task state file
list_tasks.pyandshow_task.pyprovide lightweight management
Keep these two areas separate:
- Actual task outputs live under
<workspace-root>/, such assrc/,docs/,tests/, oroutput/. - Loop bookkeeping lives under
.codex-loop/tasks/<task-id>/, includingstate.json, logs, stop reports, and round summaries.
This matters in practice because a vague prompt can make Codex accidentally write deliverables into the task ledger directory. The task root is for control-plane state unless you explicitly want a bookkeeping artifact there.
git clone https://github.com/HansBug/strict-agent-loop "${CODEX_HOME:-$HOME/.codex}/skills/strict-agent-loop"Then invoke it as $strict-agent-loop.
git clone https://github.com/HansBug/strict-agent-loop /path/to/strict-agent-loopPrompt Codex like this:
Use the $strict-agent-loop skill located at /path/to/strict-agent-loop for this task.
Create one managed task per long-running objective. Use a stable task-id when you know you will need to resume or supervise it later.
Initialize two different tasks in the same repo:
REPO=/abs/path/to/repo
SKILL=/path/to/strict-agent-loop
python "$SKILL/scripts/init_state.py" \
--workspace-root "$REPO" \
--task-id parser-fix \
--goal "Fix the parser bug in strict atomic rounds." \
--global-stop-condition "Stop only when pytest passes and the parser regression test exists." \
--success-evidence "pytest -q passes" \
--stop-command "pytest -q" \
--require-path tests/test_parser_regression.py
python "$SKILL/scripts/init_state.py" \
--workspace-root "$REPO" \
--task-id docs-cleanup \
--goal "Clean up the release documentation in strict atomic rounds." \
--global-stop-condition "Stop only when the final release note exists and contains the required summary." \
--success-evidence "release note written" \
--require-path docs/release-note.md \
--require-text "docs/release-note.md::Release summary"List and inspect them:
python "$SKILL/scripts/list_tasks.py" --workspace-root "$REPO"
python "$SKILL/scripts/show_task.py" --workspace-root "$REPO" --task-id parser-fixIf you omit --task-id, init_state.py generates one from the goal and timestamp.
Initialize the managed task:
REPO=/abs/path/to/repo
TASK_ID=parser-fix
STATE="$REPO/.codex-loop/tasks/$TASK_ID/state.json"
SKILL=/path/to/strict-agent-loop
python "$SKILL/scripts/init_state.py" \
--workspace-root "$REPO" \
--task-id "$TASK_ID" \
--goal "Fix the parser safely in strict atomic rounds." \
--global-stop-condition "Stop only when pytest passes, the regression test exists, and the bug is fixed." \
--success-evidence "pytest -q passes" \
--next-task "Reproduce the parser failure in one minimal, verifiable step." \
--stop-command "pytest -q" \
--require-path tests/test_parser_regression.pyThen prompt Codex explicitly:
Use $strict-agent-loop for this repository.
Read /abs/path/to/repo/.codex-loop/tasks/parser-fix/state.json before acting.
Treat /abs/path/to/repo as the workspace root for real work artifacts.
Treat /abs/path/to/repo/.codex-loop/tasks/parser-fix/ as bookkeeping only: state, logs, stop reports, and round summaries.
This is interactive mode.
Before each round, tell me:
- the iteration number
- how many verified rounds are already complete
- the one atomic task for this round
- the local done condition
- the global stop condition
- the condition under which the loop may stop after this round
- the recent average round time and ETA if available
Write the same announcement to the task-local events.jsonl.
After each round, verify it, run check_stop.py, then run report_status.py.
Do not widen scope and do not claim completion before the stop checks pass.
Initialize the task in unattended mode:
REPO=/abs/path/to/repo
TASK_ID=nightly-parser-fix
STATE="$REPO/.codex-loop/tasks/$TASK_ID/state.json"
SKILL=/path/to/strict-agent-loop
python "$SKILL/scripts/init_state.py" \
--workspace-root "$REPO" \
--task-id "$TASK_ID" \
--operating-mode unattended \
--goal "Finish the queued parser task without skipping the middle." \
--global-stop-condition "Stop only when python verify_task.py returns 0 and output/final-report.md exists." \
--success-evidence "python verify_task.py returns 0" \
--next-task "Start from the current repo state and make one minimal verified advance." \
--stop-command "python verify_task.py" \
--require-path output/final-report.md \
--max-iterations 200 \
--supervisor-reasoning-effort medium \
--supervisor-max-rounds-per-invocation 5 \
--supervisor-max-consecutive-failures 3By default, unattended mode now starts each Codex invocation fresh and recovers from disk. If you explicitly want to reuse the same inner Codex thread between invocations, add --supervisor-resume-existing-thread when you initialize the task.
If your provider tends to reject very heavy runs during busy periods, set --supervisor-reasoning-effort low or medium for better availability.
Start the supervisor:
python "$SKILL/scripts/supervise.py" \
--state "$STATE" \
--skill-path "$SKILL" \
--heartbeat-seconds 30 \
--max-invocation-seconds 1800 \
--max-cycles 200 \
--prompt-note "Keep each round atomic, persist every announcement and status update, and do not stop until the machine checks pass or a real blocker is recorded."The supervisor keeps broadcasting liveness and progress to the task-local files, including:
- completed iteration count
- progress bar style status
- recent round durations
- recent average round duration
- estimated remaining time when there is enough signal
It also relays inner Codex announcements plus command start/completion events to outer stdout, so an operator watching the supervisor can tell whether the run is actively moving or has stalled.
If you send SIGINT or SIGTERM to supervise.py, it saves the latest state, records an interruption event, and exits with code 130 so you can resume the same managed task later.
This is a good end-to-end stress test because the total number of rounds is not obvious up front and each round can be forced to do exactly one small step.
Use $strict-agent-loop for this repository.
The workspace root is this repo itself.
Keep output/sequence.json and output/report.md under the workspace root, not inside .codex-loop/tasks/<task-id>/.
The task is to build the hailstone sequence starting from 27.
Each round may compute and append exactly one next number. Never batch multiple steps into one round.
Persist the full sequence to output/sequence.json.
After the sequence reaches 1, spend one extra round writing output/report.md that summarizes the full sequence from disk.
Stop only when python verify_hailstone.py returns 0.
Every round must be announced, verified, persisted, and reported through strict-agent-loop.
The loop is stricter than a normal Codex prompt, but it is still subject to Codex session length, tool availability, auth state, and context limits. For real unattended work in your own repo, these practices matter:
- Use one stable
task-idper unattended objective so you can resume the exact same task later. - Keep the task state inside the target repo so the ledger survives terminal sessions and machine restarts.
- Keep real work artifacts in the workspace root and reserve
.codex-loop/tasks/<task-id>/for loop bookkeeping. - Prefer one tiny verifier script in the target repo and make it the primary
--stop-command. - Keep
--supervisor-max-rounds-per-invocationmodest so durable checkpoints are frequent. - Default unattended recovery is disk-first. Only enable
--supervisor-resume-existing-threadwhen you specifically want the same inner Codex thread and trust that environment. - Set
--supervisor-reasoning-effort lowormediumwhen you need higher availability fromcodex exec; leaving it empty uses your normal Codex config default. - Use a generous
--max-invocation-secondsso slow synthesis or reporting rounds can finish, but still set a ceiling so a bad nested Codex invocation fails loudly instead of hanging forever. - If an invocation times out or exits non-zero after already persisting verified progress, the supervisor keeps that progress and does not count the run as an additional consecutive failure.
- If a real write command hits a read-only filesystem error during unattended execution, the supervisor disables stored thread resume and falls back to fresh disk recovery on the next cycle.
- Send
Ctrl-Corkill -TERMto the supervisor when you need to pause it; it records the interruption, saves state, and exits130. - The supervisor runs
codex execwith--skip-git-repo-check, so unattended work does not depend on the repo looking perfectly clean. - In prompts, explicitly tell Codex not to re-read full
TASK.mdor fullstate.jsonevery round. After initial recovery, it should inspect only the targeted workspace artifacts and the latest persisted status it actually needs. - Watch
latest-status.txt,status-history.jsonl, andrun-summary.mdinstead of trusting the console alone. - Compact with
compact_state.pywhen the controller starts carrying too much history in memory. - If a run must recover, reuse the same task state path instead of creating a new task unless the goal really changed.
- If you run several loops in one repo, use
registry.jsonpluslist_tasks.pyandshow_task.pyto avoid collisions. - Progress bars and ETA are heuristic when your stop rules are binary, so pick realistic
max_iterationsvalues.
A correct answer should include:
- both interactive and unattended quick starts when you did not choose a mode
- the managed layout under
.codex-loop/registry.jsonand.codex-loop/tasks/<task-id>/ - the key durable artifacts and where they live
- a reminder that unattended runs should rely on machine-checkable stop rules
- exact shell commands or prompt text, not only prose
strict-agent-loop/
├── AGENTS.md
├── SKILL.md
├── README.md
├── README_zh.md
├── agents/openai.yaml
├── scripts/
│ ├── append_event.py
│ ├── check_stop.py
│ ├── compact_state.py
│ ├── init_state.py
│ ├── json_get.py
│ ├── list_tasks.py
│ ├── report_status.py
│ ├── show_task.py
│ ├── state_tools.py
│ ├── stop_tools.py
│ ├── supervise.py
│ └── update_state.py
└── references/
├── management.md
├── modes.md
├── prompt_templates.md
├── protocol.md
├── recovery.md
├── state_schema.md
└── stop_checks.md
- You can run
python ~/.codex/skills/.system/skill-creator/scripts/quick_validate.py /path/to/strict-agent-loop. - You can run the lifecycle scripts directly against a temporary workspace.
- The hailstone / Collatz scenario is the best practical forward test because it catches fake batching and weak finalization.
- The GitHub Actions workflow checks the stdlib-only scripts against Python
3.7through3.14. - Python
3.7is validated onubuntu-22.04in CI because newer Ubuntu images do not reliably provide it.
supervise.py is not fully exercised in CI because it depends on a working local codex binary plus valid session or auth state.