Skip to content

vault: gracefully handle individual blob broadcast failures in Observation (backport 2.39.2)#21781

Merged
prashantkumar1982 merged 3 commits intorelease/2.39.2from
vault/graceful-blob-broadcast-failures-2.39.2
Mar 30, 2026
Merged

vault: gracefully handle individual blob broadcast failures in Observation (backport 2.39.2)#21781
prashantkumar1982 merged 3 commits intorelease/2.39.2from
vault/graceful-blob-broadcast-failures-2.39.2

Conversation

@prashantkumar1982
Copy link
Copy Markdown
Contributor

Backport of #21765 to release/2.39.2.

Summary

During the Observation phase, pending queue payloads are broadcast as blobs in parallel. Previously, if any single broadcast failed, the entire observation was aborted — stalling the OCR round.

This changes the behavior so that individual failures are isolated: a failed broadcast is logged as a warning (with the request ID and error) and that payload is excluded from PendingQueueItems. All remaining payloads continue normally.

Each BroadcastBlob call is given a 2-second timeout so a single slow broadcast cannot stall the entire batch. Parent context cancellation/deadline errors are propagated immediately for fail-fast semantics.

Made with Cursor

…ation

Previously, if any single payload failed to broadcast as a blob during the
Observation phase, the entire observation was aborted and returned an error.
This is unnecessarily disruptive — one problematic payload (e.g. transient
network issue, malformed data) would prevent all other valid payloads from
being included in the observation, stalling the OCR round.

Now, individual broadcast failures are logged as warnings (with the request
ID and error details) and the failed payload is simply excluded from
PendingQueueItems. The remaining payloads continue to be broadcast and
observed normally.

The blob broadcast logic is extracted into a dedicated
broadcastBlobPayloads method for clarity.

Made-with: Cursor
Check ctx.Err() when BroadcastBlob fails so that context.Canceled and
context.DeadlineExceeded are returned immediately rather than swallowed.
This preserves fail-fast semantics for expired OCR rounds while still
skipping item-specific transient errors.

Made-with: Cursor
Each parallel BroadcastBlob call now gets a 2-second timeout derived from
the parent context. A slow individual broadcast will be cancelled and
skipped without stalling the rest of the batch. Parent context
cancellation still propagates immediately for round-level failures.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

✅ No conflicts with other open PRs targeting release/2.39.2

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@cl-sonarqube-production
Copy link
Copy Markdown

@trunk-io
Copy link
Copy Markdown

trunk-io bot commented Mar 30, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@prashantkumar1982 prashantkumar1982 merged commit 0a3188d into release/2.39.2 Mar 30, 2026
211 checks passed
@prashantkumar1982 prashantkumar1982 deleted the vault/graceful-blob-broadcast-failures-2.39.2 branch March 30, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants