Skip to content

feat(transport): add HTTP retry with exponential backoff#1520

Merged
jpnurmi merged 120 commits intomasterfrom
jpnurmi/feat/http-retry
Mar 31, 2026
Merged

feat(transport): add HTTP retry with exponential backoff#1520
jpnurmi merged 120 commits intomasterfrom
jpnurmi/feat/http-retry

Conversation

@jpnurmi
Copy link
Copy Markdown
Collaborator

@jpnurmi jpnurmi commented Feb 13, 2026

Add HTTP retry with exponential backoff for network failures, modeled after Crashpad's upload retry behavior.

Note

The new http_retry option was made opt-in to avoid a behavioral change.

Note

Adds an explicit 15s connect timeout to both curl and WinHTTP, matching the value used by Crashpad. This reduces the time the transport worker is blocked on unreachable hosts, previously ~75s on curl (OS TCP SYN retry limit) and 60s on WinHTTP.

Failed envelopes are stored as <db>/cache/<ts>-<n>-<uuid>.envelope and retried on startup after a 100ms throttle, and then with exponential backoff (15min, 30min, 1h, 2h, 4h, 8h). When retries are exhausted, and offline caching is enabled, envelopes are stored as <db>/cache/<uuid>.envelope instead of being discarded.

flowchart TD
    startup --> sretry{retry?}
    sretry -->|yes| throttle
    throttle -. 100ms .-> send
    send -->|success| discard
    send -->|fail| retry{retry?}
    retry -->|no| cache{cache?}
    cache -->|no| discard
    cache -->|yes| cache-dump[&lt;db&gt;/cache/<br/>&lt;uuid&gt;.envelope]
    retry -->|yes| cache-retry[&lt;db&gt;/cache/<br/>&lt;ts&gt;-&lt;n&gt;-&lt;uuid&gt;.envelope]
    cache-retry --> backoff
    backoff -. 2ⁿ×15min .-> send
Loading

Builds upon:

See also:

Closes: #1316

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 13, 2026

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 14a0aef

@jpnurmi jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from b083a57 to a264f66 Compare February 13, 2026 17:47
@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 16, 2026

@sentry review

@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 16, 2026

@cursor review

@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 16, 2026

@sentry review

@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 16, 2026

@cursor review

@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 16, 2026

@cursor review

@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 16, 2026

@cursor review

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@jpnurmi jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from 243c880 to d6aa792 Compare February 16, 2026 19:41
@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 16, 2026

@cursor review

@jpnurmi jpnurmi force-pushed the jpnurmi/feat/http-retry branch 2 times, most recently from fbbffb2 to abd5815 Compare February 17, 2026 08:34
@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 17, 2026

@cursor review

@jpnurmi jpnurmi force-pushed the jpnurmi/feat/http-retry branch from f030cf9 to 22e3fc4 Compare February 17, 2026 10:24
@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 17, 2026

@sentry review

@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 17, 2026

@cursor review

@jpnurmi jpnurmi force-pushed the jpnurmi/feat/http-retry branch from 22e3fc4 to ffce486 Compare February 17, 2026 10:49
@jpnurmi
Copy link
Copy Markdown
Collaborator Author

jpnurmi commented Feb 17, 2026

@cursor review

@jpnurmi jpnurmi force-pushed the jpnurmi/feat/http-retry branch from b133fd6 to 93db900 Compare March 17, 2026 16:16
cursoragent and others added 2 commits March 17, 2026 17:16
Use a progress callback (CURLOPT_XFERINFOFUNCTION) that checks an
atomic shutdown flag. When http_transport_shutdown_timeout fires,
it sets the flag via curl_client_shutdown, causing curl_easy_perform
to abort promptly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

jpnurmi and others added 3 commits March 30, 2026 16:23
Replace retry_dump_queue with retry_seal, leveraging the new bgworker
draining flag from upstream. Retry envelopes are already persisted to
disk by retry_enqueue, so they don't need explicit dumping — only
in-memory http_send_task envelopes are dumped by http_dump_queue.

On shutdown timeout: flush cleanup tasks + seal retry to prevent a
detached worker from writing duplicate envelopes.

Disable http_retry in the crash daemon — it's a one-shot process that
sends once and exits.

Delegate plain cache writes (retry_count < 0) to the new
sentry__envelope_write_to_cache for minidump splitting.

Add no-http-retry to all cache_keep integration test runs to ensure
they exercise the cache_keep code path, not the retry path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Retain backwards-compatible behavior by defaulting http_retry to false.
Add "http-retry" flag to the example app for opting in. Retry
integration tests now pass "http-retry" explicitly, while cache tests
keep "no-http-retry" for when the default is eventually flipped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The envelope tag (next_tag, envelope->tag, sentry__envelope_get_tag)
and sealed_tag on retry were only used by retry_dump_cb, which was
replaced by retry_seal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi force-pushed the jpnurmi/feat/http-retry branch from 0c94acd to a485974 Compare March 30, 2026 14:48
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

jpnurmi and others added 4 commits March 30, 2026 17:00
The else branch clearing retry_func on retry allocation failure had no
effect: http_send_task decides retry vs cache based on state->retry
(the pointer), and http_transport_retry already no-ops when it's NULL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When cache_keep was enabled and retries were exhausted, handle_result
would move the envelope to cache format regardless of whether the send
succeeded. Only cache on network failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inline single-use env dicts and use plain `env` where reused.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When bgworker shutdown times out, remaining queued envelopes are dumped
to disk. With retry enabled, these should be written to cache as retry
files so they get retried on next startup. The merge that replaced
retry_dump_queue with retry_seal lost this: dump_queue was writing to
the run directory unconditionally.

Write directly to cache via sentry__run_write_cache, bypassing the
sealed retry_enqueue since the seal is already set at this point.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

jpnurmi and others added 9 commits March 30, 2026 17:45
Revert http_dump_task_cb change — dumped envelopes belong in *.run/
and get picked up by process_old_runs on next startup. Fix tests to
count envelopes in both cache/ (retry) and *.run/ (shutdown dump),
since slow transports may not process all sends before the timeout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The -00-/-01- assertions verified "stop on first failure" (only one
envelope gets bumped per retry cycle). On slow transports like WinHTTP
(~2s per connect failure), the startup poll may never fire during the
second run because old-run envelope sends block the bgworker until
shutdown. The retry file from the first run stays unprocessed.

Retry bumping behavior is already covered by
test_http_retry_multiple_attempts (single envelope, deterministic).
This test focuses on total envelope preservation across cache/ and
*.run/, which is the meaningful invariant for multiple envelopes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
retry_enqueue transitioned the state from STARTUP to RUNNING to prevent
the startup poll from re-processing same-session envelopes via timestamp
collision. This was needed when timestamps had second precision, but
since the switch to millisecond precision (sentry__usec_time / 1000),
the collision window is 1ms — negligible.

The transition caused two problems:
- The startup poll used backoff mode instead of immediate-send for
  pre-existing retry files from previous sessions
- retry_flush_task during shutdown became a no-op, skipping high-count
  files entirely

Removing the transition lets the startup poll and flush task work as
designed: pre-existing retry files are sent promptly on startup and
flushed on shutdown regardless of same-session enqueue activity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The -00-/-01- assertions depend on the startup poll or flush task
processing the retry file from the previous run. On slow transports
(WinHTTP ~2s per connect failure), the flush task is queued behind
old-run send tasks and the bgworker exits (draining) before reaching
it. The retry file is never bumped.

Retry count bumping is deterministically tested by
test_http_retry_multiple_attempts (single envelope). This test
verifies no envelopes are lost across cache/ and *.run/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restore the STARTUP→RUNNING transition in retry_enqueue — removing it
caused the startup poll to re-process same-session envelopes (files
written by enqueue with ts == startup_time got picked up and bumped).

The transition is needed to prevent re-processing, but it made
retry_flush_task a no-op during shutdown (state already RUNNING).
Fix by making flush unconditional: its job is to send all pending
retry files before shutdown, regardless of whether the startup poll
already ran.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use startup_time instead of UINT64_MAX to filter files in flush task.
UINT64_MAX bypassed all filtering and picked up envelopes written by
retry_enqueue during the same session, bumping them from -00- to -01-.

With startup_time, the flush only processes files from previous
sessions (ts < startup_time), matching the startup poll's behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Swap http_send_envelope parameter order to match
sentry_retry_send_func_t so it can be passed directly to
sentry__retry_start without a wrapper.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
write_envelope is used by run_write_envelope and run_write_external
which never need retry format. Move the retry path building into
sentry__run_write_cache using sentry__run_make_cache_path, removing
duplicate snprintf logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Leftover from the envelope tag mechanism removed in a485974.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jpnurmi jpnurmi requested a review from mujacica March 31, 2026 07:17
@jpnurmi jpnurmi merged commit 72e1d42 into master Mar 31, 2026
87 of 89 checks passed
@jpnurmi jpnurmi deleted the jpnurmi/feat/http-retry branch March 31, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Offline Caching of Envelopes

4 participants