Catch and retry rate limited responses in read-only GraphQL queries#1531
Catch and retry rate limited responses in read-only GraphQL queries#1531
Conversation
Unit Test Results 1 files 1 suites 25s ⏱️ Results for commit e56e2ec. ♻️ This comment has been updated with latest results. |
b2ffc2c to
e56e2ec
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves robustness of GEI’s read-only GraphQL polling by adding secondary rate-limit detection/retry handling to GraphQL POST calls, aligning it more closely with existing REST retry behavior.
Changes:
- Added
PostGraphQLWithRetryAsynctoGithubClientto detect GraphQL secondary rate-limit responses and trigger the existing secondary-rate-limit retry path. - Updated
GithubApi.GetMigrationto use the new GraphQL-with-retry client method. - Updated unit tests/mocks to reflect the new client method and to reduce test runtime.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/Octoshift/Services/GithubClient.cs | Adds GraphQL secondary rate-limit detection and a new GraphQL POST helper that integrates with retry logic. |
| src/Octoshift/Services/GithubApi.cs | Switches migration state polling to use the new GraphQL-with-retry path. |
| src/OctoshiftCLI.Tests/Octoshift/Services/GithubApiTests.cs | Updates mocks to expect PostGraphQLWithRetryAsync instead of PostGraphQLAsync. |
| src/OctoshiftCLI.Tests/Octoshift/Services/GithubClientTests.cs | Adjusts tests to shorten delays by overriding the default backoff delay. |
Comments suppressed due to low confidence (2)
src/Octoshift/Services/GithubClient.cs:377
- HandleSecondaryRateLimit now uses _secondaryRateLimitMaxRetries for the stop condition, but the exception message and log message still print SECONDARY_RATE_LIMIT_MAX_RETRIES. If the max is overridden (tests or future config), the thrown/logged “(attempt x/3)” and “Maximum retries (3)” will be incorrect. Use the same variable for the messages as the check uses.
if (retryCount >= _secondaryRateLimitMaxRetries)
{
throw new OctoshiftCliException($"Secondary rate limit exceeded. Maximum retries ({SECONDARY_RATE_LIMIT_MAX_RETRIES}) reached. Please wait before retrying your request.");
}
var delaySeconds = GetSecondaryRateLimitDelay(headers, retryCount);
_log.LogWarning($"Secondary rate limit detected (attempt {retryCount + 1}/{SECONDARY_RATE_LIMIT_MAX_RETRIES}). Waiting {delaySeconds} seconds before retrying...");
src/OctoshiftCLI.Tests/Octoshift/Services/GithubClientTests.cs:2210
- Same issue here: setting _secondaryRateLimitDefaultDelay to 0 makes the test assertions stop exercising the exponential backoff calculation across retries. A no-op delay hook (or scaled-down delays) would allow validating the backoff values without slowing the test suite.
var githubClient = new GithubClient(_mockOctoLogger.Object, httpClient, null, _retryPolicy, _dateTimeProvider.Object, PERSONAL_ACCESS_TOKEN)
{
// Set short default delay to speed up the test. This is reflected in the logs below.
_secondaryRateLimitDefaultDelay = 0
};
// Act & Assert
await FluentActions
.Invoking(async () => await githubClient.DeleteAsync("http://example.com"))
.Should()
.ThrowExactlyAsync<OctoshiftCliException>()
.WithMessage("Secondary rate limit exceeded. Maximum retries (3) reached. Please wait before retrying your request.");
// Verify all retry attempts were logged
_mockOctoLogger.Verify(m => m.LogWarning("Secondary rate limit detected (attempt 1/3). Waiting 0 seconds before retrying..."), Times.Once);
_mockOctoLogger.Verify(m => m.LogWarning("Secondary rate limit detected (attempt 2/3). Waiting 0 seconds before retrying..."), Times.Once);
_mockOctoLogger.Verify(m => m.LogWarning("Secondary rate limit detected (attempt 3/3). Waiting 0 seconds before retrying..."), Times.Once);
| var (response, headers) = await PostWithRetry(url, body, customHeaders); | ||
|
|
||
| if (IsGraphQLSecondaryRateLimit(response)) | ||
| { | ||
| (response, _) = await HandleSecondaryRateLimit(HttpMethod.Post, url, body, HttpStatusCode.OK, customHeaders, headers, retryCount); | ||
| } | ||
|
|
||
| var data = JObject.Parse(response); | ||
| EnsureSuccessGraphQLResponse(data); | ||
|
|
||
| return data; |
There was a problem hiding this comment.
PostGraphQLWithRetryAsync only checks for GraphQL secondary rate limiting once. HandleSecondaryRateLimit retries via SendAsync, but SendAsync won’t detect GraphQL rate-limit responses (HTTP 200 with errors[].type == RATE_LIMIT), so a second rate-limited response will fall through to EnsureSuccessGraphQLResponse and throw instead of continuing to retry up to the max. Consider looping/re-invoking the GraphQL rate-limit check on every response (and using PostWithRetry for the subsequent attempts so transient exceptions still get retried).
| var (response, headers) = await PostWithRetry(url, body, customHeaders); | |
| if (IsGraphQLSecondaryRateLimit(response)) | |
| { | |
| (response, _) = await HandleSecondaryRateLimit(HttpMethod.Post, url, body, HttpStatusCode.OK, customHeaders, headers, retryCount); | |
| } | |
| var data = JObject.Parse(response); | |
| EnsureSuccessGraphQLResponse(data); | |
| return data; | |
| var currentRetryCount = retryCount; | |
| while (true) | |
| { | |
| var (response, headers) = await PostWithRetry(url, body, customHeaders); | |
| if (IsGraphQLSecondaryRateLimit(response)) | |
| { | |
| (response, headers) = await HandleSecondaryRateLimit(HttpMethod.Post, url, body, HttpStatusCode.OK, customHeaders, headers, currentRetryCount); | |
| if (IsGraphQLSecondaryRateLimit(response)) | |
| { | |
| currentRetryCount++; | |
| continue; | |
| } | |
| } | |
| var data = JObject.Parse(response); | |
| EnsureSuccessGraphQLResponse(data); | |
| return data; | |
| } |
| return await _retryPolicy.Retry(async () => | ||
| { | ||
| var data = await _client.PostGraphQLAsync(url, payload); | ||
| var data = await _client.PostGraphQLWithRetryAsync(url, payload); | ||
|
|
There was a problem hiding this comment.
GetMigration now wraps _retryPolicy.Retry around PostGraphQLWithRetryAsync, but PostGraphQLWithRetryAsync already retries via PostWithRetry/_retryPolicy. This can multiply retries and backoff delays (and increase load) in production. Consider removing the outer Retry here (or switching to PostGraphQLAsync if you want the retry to live only at the GithubApi layer).
See below for a potential fix:
var data = await _client.PostGraphQLWithRetryAsync(url, payload);
return (
State: (string)data["data"]["node"]["state"],
RepositoryName: (string)data["data"]["node"]["repositoryName"],
WarningsCount: (int)data["data"]["node"]["warningsCount"],
FailureReason: (string)data["data"]["node"]["failureReason"],
MigrationLogUrl: (string)data["data"]["node"]["migrationLogUrl"]);
| var githubClient = new GithubClient(_mockOctoLogger.Object, httpClient, null, _retryPolicy, _dateTimeProvider.Object, PERSONAL_ACCESS_TOKEN) | ||
| { | ||
| // Set short default delay to speed up the test. This is reflected in the logs below. | ||
| _secondaryRateLimitDefaultDelay = 0 | ||
| }; | ||
|
|
||
| // Act | ||
| var result = await githubClient.PatchAsync("http://example.com", _rawRequestBody); | ||
|
|
||
| // Assert | ||
| result.Should().Be("SUCCESS_RESPONSE"); | ||
| _mockOctoLogger.Verify(m => m.LogWarning("Secondary rate limit detected (attempt 1/3). Waiting 60 seconds before retrying..."), Times.Once); | ||
| _mockOctoLogger.Verify(m => m.LogWarning("Secondary rate limit detected (attempt 2/3). Waiting 120 seconds before retrying..."), Times.Once); | ||
| _mockOctoLogger.Verify(m => m.LogWarning("Secondary rate limit detected (attempt 1/3). Waiting 0 seconds before retrying..."), Times.Once); | ||
| _mockOctoLogger.Verify(m => m.LogWarning("Secondary rate limit detected (attempt 2/3). Waiting 0 seconds before retrying..."), Times.Once); |
There was a problem hiding this comment.
These tests set _secondaryRateLimitDefaultDelay to 0 to avoid waiting, but that also stops validating the exponential backoff behavior (the assertions now expect 0 seconds for every attempt). Consider keeping the test fast while still asserting the computed backoff (e.g., inject/override the delay mechanism so the code doesn’t actually sleep, while logging/asserting 60/120/etc or scaled-down values like 1/2/4).
This issue also appears on line 2194 of the same file.
Closes https://github.ghe.com/github/octoshift/issues/11503!
This PR makes GraphQL queries include both general retries and rate limiting! This makes read-only GraphQL queries have the same robustness as other GET calls in the GEI CLI.
Demo
Here is a migration in progress, handling rate limiting:
ThirdPartyNotices.txt(if applicable)