fix(gateway): 识别 SSE 正文里的 usage-limit 并按配额降权候选账号 by maiqigh · Pull Request #146 · qxcnm/Codex-Manager

maiqigh · 2026-04-20T14:00:04Z

背景

OpenAI 的 usage-limit 并不总是走 event: response.failed：部分场景下会用 200 OK + SSE data 帧把 "You've hit your usage limit. To get more access now, ..." 当成普通助手内容流回来。目前 HTTP SSE 链路识别不到这种形态，导致：

bridge.stream_terminal_error 不设置 → response_finalize 不走 failover
账号不会被 mark_account_cooldown / mark_account_unavailable_for_gateway_error
下次请求路由还是挑到这个快耗尽的号，用户反复看到 "You've hit your usage limit"

responses_websocket.rs 的 infer_ws_terminal_status 已经做了类似识别（crate::account_status::usage_limit_reason_from_message），但 HTTP SSE 的 passthrough 路径没有对应处理；list_gateway_candidates 又只按 used_percent < 100 过滤，实际 OpenAI 在 95-99% 就会开始拒绝，健康的号排不到前面。

Closes #118

变更

Fix A — `passthrough.rs::update_usage_from_frame`

在所有分支之前扫一遍 data: 正文，命中 usage_limit_reason_from_message 就把 collector.saw_terminal = true 且 collector.terminal_error = Some(payload)。原有的 inspection.usage.is_none() && inspection.terminal.is_none() 早退分支会被带有 delta 文本的帧跳过（delta 会 get_or_insert 出 Some 的 usage），所以必须放在最前面。

后续 response_finalize 的 bridge.is_ok(is_stream) = false → final_error = Some(usage-limit payload) → should_failover_for_gateway_error 返回 true → 走 FinalizeUpstreamResponseOutcome::Failover + mark_account_cooldown 路径。当前请求的字节已经流给了客户端（无法回滚），但账号被记账 + cooldown，下一条请求就会跳过这个号。

Fix B — `selection.rs::demote_low_quota_candidates`

collect_gateway_candidates_uncached 里加一遍稳定排序：读 latest_usage_snapshots_by_account，used_percent 或 secondary_used_percent 超过阈值的账号排到候选列表尾部。

阈值通过环境变量 CODEXMANAGER_LOW_QUOTA_THRESHOLD_PERCENT 配置，默认 95
不剔除任何账号，只重排 —— 全部触阈时保底仍返回所有账号
稳定排序，健康账号之间的原有 sort 次序保留

会话粘性（rotate_to_bound_account）运行在 Fix B 之后：绑定的账号仍会被旋转到 [0]，粘性不会被 Fix B 破坏。新会话走 Fix B 的配额偏好；有绑定的老会话走粘性，两者不冲突。

测试

新增 8 条，本地全绿（workspace 上已有的预先失败数不变）：

crates/service/src/gateway/observability/http_bridge/stream_readers/passthrough.rs 4 个单元：
- extract_usage_limit_matches_plain_text_delta — OpenAI 的 apology delta 帧被识别
- extract_usage_limit_matches_quota_exceeded_json — insufficient_quota / quota exceeded 形态
- extract_usage_limit_ignores_unrelated_content — 正常 delta 不误伤
- extract_usage_limit_ignores_frames_without_data — 纯 event/keepalive 帧不误伤
crates/service/src/gateway/routing/tests/selection_tests.rs 2 个单元：
- low_quota_accounts_are_demoted_to_tail — primary 或 secondary 触阈都会被降权，稳定保序
- all_low_quota_still_returns_candidates — 全部触阈时保底返回
crates/service/tests/gateway_logs/usage_limit_failover.rs 2 个集成：
- gateway_usage_limit_in_sse_marks_request_as_failover — mock 上游用 text/event-stream 正文夹带 usage-limit 文本，真实 HTTP 走完一圈，断言 request_log.status_code == 502 且 account_id 记在命中的 primary 号下
- gateway_low_quota_account_is_skipped_on_first_request — primary 号快照 99%、secondary 号 10%，即便 sort=0 的是 primary，请求也必须落到 secondary

跑法：

cargo test -p codexmanager-service --lib extract_usage_limit
cargo test -p codexmanager-service --lib low_quota
cargo test -p codexmanager-service --test gateway_logs usage_limit_failover

兼容性

默认阈值 95；如果旧部署里账号一直在 95% 以下，行为和之前一致
没有改 storage schema、没有改 RPC 协议、没有改 API Key / Account 字段
support.rs 里把 start_mock_upstream_sequence_lenient 重构为调用新的 _with_content_types 变体，公开接口兼容（start_mock_upstream_sequence 签名不变）

OpenAI 的 usage-limit 有时用 200 + SSE data 正文夹带文本返回（不走 response.failed），原链路识别不到，账号不标 cooldown，网关持续打到快耗尽的号导致客户端反复看到 "You've hit your usage limit"。 - Fix A (passthrough.rs): update_usage_from_frame 在进任何分支前扫 data: 正文，命中 usage_limit_reason_from_message 就把 collector 的 terminal_error 置位，让 response_finalize 走 failover 记账 + cooldown 分支 - Fix B (selection.rs): collect_gateway_candidates_uncached 加 demote_low_quota_candidates，used_percent (primary 或 secondary) 超过阈值的账号稳定降权到候选列表尾部。阈值通过 CODEXMANAGER_LOW_QUOTA_THRESHOLD_PERCENT 配置，默认 95；全部触阈时不剔除，保底仍返回所有账号新增测试：4 个 passthrough 单测、2 个 selection 单测、2 个 gateway_logs 集成测试（覆盖真实 HTTP → mock 上游的路由与记账链路），全绿，主干已有的预先失败不变。

The prior SSE usage-limit detection matches the ChatGPT frontend phrasing ("You've hit your usage limit ... try again at X") but misses the shorter backend-native phrasing that arrives via WebSocket upstreams: "The usage limit has been reached". With the pattern missing, usage_limit_reason_from_message returns None → analyze_gateway_error classifies the error as Other → no failover is triggered → the client receives a 502 while other healthy candidates are never tried (attemptedAccountIds stays empty). Add the missing pattern and extend the unit test to cover this phrasing.

yangbiao added 2 commits April 20, 2026 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): 识别 SSE 正文里的 usage-limit 并按配额降权候选账号#146

fix(gateway): 识别 SSE 正文里的 usage-limit 并按配额降权候选账号#146
maiqigh wants to merge 2 commits intoqxcnm:mainfrom
maiqigh:upstream/usage-limit-sse-failover

maiqigh commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maiqigh commented Apr 20, 2026

背景

变更

Fix A — passthrough.rs::update_usage_from_frame

Fix B — selection.rs::demote_low_quota_candidates

测试

兼容性

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix A — `passthrough.rs::update_usage_from_frame`

Fix B — `selection.rs::demote_low_quota_candidates`