-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
Description
Summary
The Smoke Codex workflow failed on the agent job for a scheduled run.
- Run: 22953259892
- Commit:
dda2d3161ecaf546f306752680d16a16d33fca34 - Trigger:
schedule(cron28 */12 * * *) - Time: 2026-03-11T12:46:22Z
Root Cause
The Codex (OpenAI) agent made exactly 1 API call to api.openai.com but did not invoke any safe output tools afterward. The smoke test framework requires agents to call at least one safe output tool to confirm task completion.
##[error]No safe outputs were invoked. Smoke tests require the agent to call safe output tools.
##[error]Process completed with exit code 1.
```
**Firewall activity summary from the run:**
```
▼ 1 request | 1 allowed | 0 blocked | 1 unique domain
| Domain | Allowed | Denied |
|----------------|---------|--------|
| api.openai.com | 1 | 0 |
Only 1 outbound request was made to api.openai.com, suggesting the model received the prompt but either:
- Completed its response text-only without calling any tool
- Hit a budget/context/token limit before calling safe output tools
- Encountered an error that prevented tool use
Pattern Analysis
Looking at recent smoke-codex runs:
| Run | # | Trigger | Conclusion |
|---|---|---|---|
| 22953259892 | 891 | schedule | ❌ failure — no safe outputs |
| 22932379700 | 890 | pull_request | ❌ failure — called output but not add_comment |
| 22932264163–22932059934 | 884–889 | pull_request | ✅ success |
The scheduled run (which has no PR context) appears to consistently fail with "no safe outputs," suggesting the Codex model may not be following instructions to call safe output tools when there is no PR context to comment on.
Recommended Actions
- Review the Smoke Codex prompt (
smoke-codex.md) to ensure it explicitly instructs the agent to call a safe output tool (e.g.,noop) when running on a schedule trigger with no actionable output. - Check if the Codex model/API version was updated recently — a model change could affect tool-calling behavior.
- Re-run the workflow to see if this is intermittent or consistently failing on schedule.
- Compare schedule vs PR instructions in the prompt — the PR run (890) at least called some safe output tool, while the schedule run called none at all.
Generated by CI Doctor
Reactions are currently unavailable