Skip to content

fix(gateway): 识别 SSE 正文里的 usage-limit 并按配额降权候选账号#146

Open
maiqigh wants to merge 2 commits intoqxcnm:mainfrom
maiqigh:upstream/usage-limit-sse-failover
Open

fix(gateway): 识别 SSE 正文里的 usage-limit 并按配额降权候选账号#146
maiqigh wants to merge 2 commits intoqxcnm:mainfrom
maiqigh:upstream/usage-limit-sse-failover

Conversation

@maiqigh
Copy link
Copy Markdown

@maiqigh maiqigh commented Apr 20, 2026

背景

OpenAI 的 usage-limit 并不总是走 event: response.failed:部分场景下会用 200 OK + SSE data 帧把 "You've hit your usage limit. To get more access now, ..." 当成普通助手内容流回来。目前 HTTP SSE 链路识别不到这种形态,导致:

  • bridge.stream_terminal_error 不设置 → response_finalize 不走 failover
  • 账号不会被 mark_account_cooldown / mark_account_unavailable_for_gateway_error
  • 下次请求路由还是挑到这个快耗尽的号,用户反复看到 "You've hit your usage limit"

responses_websocket.rsinfer_ws_terminal_status 已经做了类似识别(crate::account_status::usage_limit_reason_from_message),但 HTTP SSE 的 passthrough 路径没有对应处理;list_gateway_candidates 又只按 used_percent < 100 过滤,实际 OpenAI 在 95-99% 就会开始拒绝,健康的号排不到前面。

Closes #118

变更

Fix A — passthrough.rs::update_usage_from_frame

在所有分支之前扫一遍 data: 正文,命中 usage_limit_reason_from_message 就把 collector.saw_terminal = truecollector.terminal_error = Some(payload)。原有的 inspection.usage.is_none() && inspection.terminal.is_none() 早退分支会被带有 delta 文本的帧跳过(delta 会 get_or_insertSome 的 usage),所以必须放在最前面。

后续 response_finalizebridge.is_ok(is_stream) = falsefinal_error = Some(usage-limit payload)should_failover_for_gateway_error 返回 true → 走 FinalizeUpstreamResponseOutcome::Failover + mark_account_cooldown 路径。当前请求的字节已经流给了客户端(无法回滚),但账号被记账 + cooldown,下一条请求就会跳过这个号。

Fix B — selection.rs::demote_low_quota_candidates

collect_gateway_candidates_uncached 里加一遍稳定排序:读 latest_usage_snapshots_by_accountused_percentsecondary_used_percent 超过阈值的账号排到候选列表尾部。

  • 阈值通过环境变量 CODEXMANAGER_LOW_QUOTA_THRESHOLD_PERCENT 配置,默认 95
  • 不剔除任何账号,只重排 —— 全部触阈时保底仍返回所有账号
  • 稳定排序,健康账号之间的原有 sort 次序保留

会话粘性(rotate_to_bound_account)运行在 Fix B 之后:绑定的账号仍会被旋转到 [0],粘性不会被 Fix B 破坏。新会话走 Fix B 的配额偏好;有绑定的老会话走粘性,两者不冲突。

测试

新增 8 条,本地全绿(workspace 上已有的预先失败数不变):

  • crates/service/src/gateway/observability/http_bridge/stream_readers/passthrough.rs 4 个单元:
    • extract_usage_limit_matches_plain_text_delta — OpenAI 的 apology delta 帧被识别
    • extract_usage_limit_matches_quota_exceeded_jsoninsufficient_quota / quota exceeded 形态
    • extract_usage_limit_ignores_unrelated_content — 正常 delta 不误伤
    • extract_usage_limit_ignores_frames_without_data — 纯 event/keepalive 帧不误伤
  • crates/service/src/gateway/routing/tests/selection_tests.rs 2 个单元:
    • low_quota_accounts_are_demoted_to_tail — primary 或 secondary 触阈都会被降权,稳定保序
    • all_low_quota_still_returns_candidates — 全部触阈时保底返回
  • crates/service/tests/gateway_logs/usage_limit_failover.rs 2 个集成:
    • gateway_usage_limit_in_sse_marks_request_as_failover — mock 上游用 text/event-stream 正文夹带 usage-limit 文本,真实 HTTP 走完一圈,断言 request_log.status_code == 502account_id 记在命中的 primary 号下
    • gateway_low_quota_account_is_skipped_on_first_request — primary 号快照 99%、secondary 号 10%,即便 sort=0 的是 primary,请求也必须落到 secondary

跑法:

cargo test -p codexmanager-service --lib extract_usage_limit
cargo test -p codexmanager-service --lib low_quota
cargo test -p codexmanager-service --test gateway_logs usage_limit_failover

兼容性

  • 默认阈值 95;如果旧部署里账号一直在 95% 以下,行为和之前一致
  • 没有改 storage schema、没有改 RPC 协议、没有改 API Key / Account 字段
  • support.rs 里把 start_mock_upstream_sequence_lenient 重构为调用新的 _with_content_types 变体,公开接口兼容(start_mock_upstream_sequence 签名不变)

yangbiao added 2 commits April 20, 2026 21:58
OpenAI 的 usage-limit 有时用 200 + SSE data 正文夹带文本返回(不走
response.failed),原链路识别不到,账号不标 cooldown,网关持续打到
快耗尽的号导致客户端反复看到 "You've hit your usage limit"。

- Fix A (passthrough.rs): update_usage_from_frame 在进任何分支前扫
  data: 正文,命中 usage_limit_reason_from_message 就把 collector
  的 terminal_error 置位,让 response_finalize 走 failover 记账 +
  cooldown 分支
- Fix B (selection.rs): collect_gateway_candidates_uncached 加
  demote_low_quota_candidates,used_percent (primary 或 secondary)
  超过阈值的账号稳定降权到候选列表尾部。阈值通过
  CODEXMANAGER_LOW_QUOTA_THRESHOLD_PERCENT 配置,默认 95;全部触阈时
  不剔除,保底仍返回所有账号

新增测试:4 个 passthrough 单测、2 个 selection 单测、2 个 gateway_logs
集成测试(覆盖真实 HTTP → mock 上游的路由与记账链路),全绿,主干已有的
预先失败不变。
The prior SSE usage-limit detection matches the ChatGPT frontend phrasing
("You've hit your usage limit ... try again at X") but misses the shorter
backend-native phrasing that arrives via WebSocket upstreams:
"The usage limit has been reached".

With the pattern missing, usage_limit_reason_from_message returns None →
analyze_gateway_error classifies the error as Other → no failover is
triggered → the client receives a 502 while other healthy candidates are
never tried (attemptedAccountIds stays empty).

Add the missing pattern and extend the unit test to cover this phrasing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

claude code轮询失败

1 participant