fix(gateway): 识别 SSE 正文里的 usage-limit 并按配额降权候选账号#146
Open
maiqigh wants to merge 2 commits intoqxcnm:mainfrom
Open
fix(gateway): 识别 SSE 正文里的 usage-limit 并按配额降权候选账号#146maiqigh wants to merge 2 commits intoqxcnm:mainfrom
maiqigh wants to merge 2 commits intoqxcnm:mainfrom
Conversation
added 2 commits
April 20, 2026 21:58
OpenAI 的 usage-limit 有时用 200 + SSE data 正文夹带文本返回(不走 response.failed),原链路识别不到,账号不标 cooldown,网关持续打到 快耗尽的号导致客户端反复看到 "You've hit your usage limit"。 - Fix A (passthrough.rs): update_usage_from_frame 在进任何分支前扫 data: 正文,命中 usage_limit_reason_from_message 就把 collector 的 terminal_error 置位,让 response_finalize 走 failover 记账 + cooldown 分支 - Fix B (selection.rs): collect_gateway_candidates_uncached 加 demote_low_quota_candidates,used_percent (primary 或 secondary) 超过阈值的账号稳定降权到候选列表尾部。阈值通过 CODEXMANAGER_LOW_QUOTA_THRESHOLD_PERCENT 配置,默认 95;全部触阈时 不剔除,保底仍返回所有账号 新增测试:4 个 passthrough 单测、2 个 selection 单测、2 个 gateway_logs 集成测试(覆盖真实 HTTP → mock 上游的路由与记账链路),全绿,主干已有的 预先失败不变。
The prior SSE usage-limit detection matches the ChatGPT frontend phrasing
("You've hit your usage limit ... try again at X") but misses the shorter
backend-native phrasing that arrives via WebSocket upstreams:
"The usage limit has been reached".
With the pattern missing, usage_limit_reason_from_message returns None →
analyze_gateway_error classifies the error as Other → no failover is
triggered → the client receives a 502 while other healthy candidates are
never tried (attemptedAccountIds stays empty).
Add the missing pattern and extend the unit test to cover this phrasing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
OpenAI 的 usage-limit 并不总是走
event: response.failed:部分场景下会用200 OK + SSE data帧把 "You've hit your usage limit. To get more access now, ..." 当成普通助手内容流回来。目前 HTTP SSE 链路识别不到这种形态,导致:bridge.stream_terminal_error不设置 →response_finalize不走 failovermark_account_cooldown/mark_account_unavailable_for_gateway_errorresponses_websocket.rs的infer_ws_terminal_status已经做了类似识别(crate::account_status::usage_limit_reason_from_message),但 HTTP SSE 的 passthrough 路径没有对应处理;list_gateway_candidates又只按used_percent < 100过滤,实际 OpenAI 在 95-99% 就会开始拒绝,健康的号排不到前面。Closes #118
变更
Fix A —
passthrough.rs::update_usage_from_frame在所有分支之前扫一遍
data:正文,命中usage_limit_reason_from_message就把collector.saw_terminal = true且collector.terminal_error = Some(payload)。原有的inspection.usage.is_none() && inspection.terminal.is_none()早退分支会被带有delta文本的帧跳过(delta 会get_or_insert出Some的 usage),所以必须放在最前面。后续
response_finalize的bridge.is_ok(is_stream) = false→final_error = Some(usage-limit payload)→should_failover_for_gateway_error返回 true → 走FinalizeUpstreamResponseOutcome::Failover+mark_account_cooldown路径。当前请求的字节已经流给了客户端(无法回滚),但账号被记账 + cooldown,下一条请求就会跳过这个号。Fix B —
selection.rs::demote_low_quota_candidatescollect_gateway_candidates_uncached里加一遍稳定排序:读latest_usage_snapshots_by_account,used_percent或secondary_used_percent超过阈值的账号排到候选列表尾部。CODEXMANAGER_LOW_QUOTA_THRESHOLD_PERCENT配置,默认95会话粘性(
rotate_to_bound_account)运行在 Fix B 之后:绑定的账号仍会被旋转到[0],粘性不会被 Fix B 破坏。新会话走 Fix B 的配额偏好;有绑定的老会话走粘性,两者不冲突。测试
新增 8 条,本地全绿(workspace 上已有的预先失败数不变):
crates/service/src/gateway/observability/http_bridge/stream_readers/passthrough.rs4 个单元:extract_usage_limit_matches_plain_text_delta— OpenAI 的 apology delta 帧被识别extract_usage_limit_matches_quota_exceeded_json—insufficient_quota/quota exceeded形态extract_usage_limit_ignores_unrelated_content— 正常 delta 不误伤extract_usage_limit_ignores_frames_without_data— 纯 event/keepalive 帧不误伤crates/service/src/gateway/routing/tests/selection_tests.rs2 个单元:low_quota_accounts_are_demoted_to_tail— primary 或 secondary 触阈都会被降权,稳定保序all_low_quota_still_returns_candidates— 全部触阈时保底返回crates/service/tests/gateway_logs/usage_limit_failover.rs2 个集成:gateway_usage_limit_in_sse_marks_request_as_failover— mock 上游用text/event-stream正文夹带 usage-limit 文本,真实 HTTP 走完一圈,断言request_log.status_code == 502且account_id记在命中的 primary 号下gateway_low_quota_account_is_skipped_on_first_request— primary 号快照 99%、secondary 号 10%,即便 sort=0 的是 primary,请求也必须落到 secondary跑法:
兼容性
support.rs里把start_mock_upstream_sequence_lenient重构为调用新的_with_content_types变体,公开接口兼容(start_mock_upstream_sequence签名不变)