Skip to content

DD 7.3: Batch drain relocationComplete to prevent fetchKeysComplete OOM#12993

Open
saintstack wants to merge 6 commits intoapple:release-7.3from
saintstack:dd_7.3_starvation
Open

DD 7.3: Batch drain relocationComplete to prevent fetchKeysComplete OOM#12993
saintstack wants to merge 6 commits intoapple:release-7.3from
saintstack:dd_7.3_starvation

Conversation

@saintstack
Copy link
Copy Markdown
Contributor

@saintstack saintstack commented Apr 14, 2026

The DDQueue choose loop processes one event per iteration. When
dataTransferComplete and other events are frequent, relocationComplete
processing is starved — fetchKeysComplete entries (erased only in the
relocationComplete handler) accumulate without bound. On one cluster this reached
33,593 entries causing OOM.

Extract completion processing into DDQueue::processRelocationComplete()
and batch-drain all ready relocationComplete events after the first
waitNext, capped at 1000 per iteration to avoid hogging the event loop.
This keeps fetchKeysComplete bounded regardless of event interleaving.

20260415-043932-stack_centos7_all_starvation-db5246251777f6a compressed=True data_size=51146319 duration=4673620 ended=99999 fail=1 fail_fast=10 max_runs=100000 pass=99998 priority=100 remaining=0:00:00 runtime=0:52:07 sanity=False started=100000 submitted=20260415-043932 timeout=5400 username=stack_centos7_all_starvation

The failure was RandomSeed="4014465418" SourceVersion="5bfd01720280600535c8bedf5c4bd2cbd4da453d" Time="1776228920" BuggifyEnabled="1" DeterminismCheck="0" FaultInjectionEnabled="1" TestFile="tests/fast/GetMappedRange.toml"

michael stack added 2 commits April 14, 2026 16:35
The DDQueue choose loop processes one event per iteration. When
dataTransferComplete and other events are frequent, relocationComplete
processing is starved — fetchKeysComplete entries (erased only in the
relocationComplete handler) accumulate unboundedly. On p67 this reached
33,593 entries causing OOM.

Extract completion processing into DDQueue::processRelocationComplete()
and batch-drain all ready relocationComplete events after the first
waitNext, capped at 1000 per iteration to avoid hogging the event loop.
This keeps fetchKeysComplete bounded regardless of event interleaving.
The done variable from waitNext is const RelocateData. Use a separate
non-const variable for the drain loop.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a DDQueue event-loop starvation issue where relocationComplete processing can be delayed by other frequent events (e.g. dataTransferComplete), allowing fetchKeysComplete to grow without bound and potentially cause OOM. It refactors relocation-completion handling into a helper and batch-drains ready completion events to keep fetchKeysComplete bounded.

Changes:

  • Extracted relocation completion bookkeeping into DDQueue::processRelocationComplete().
  • Added a bounded batch-drain loop to process additional ready relocationComplete events (up to a fixed cap) in one choose-branch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
fdbserver/include/fdbserver/DDRelocationQueue.h Declares the new processRelocationComplete() helper on DDQueue.
fdbserver/DDRelocationQueue.actor.cpp Implements processRelocationComplete() and adds a bounded drain loop to reduce starvation/backlog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fdbserver/DDRelocationQueue.actor.cpp Outdated
Comment thread fdbserver/DDRelocationQueue.actor.cpp Outdated
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-73 on Linux RHEL 9

  • Commit ID: 0b7a802
  • Duration 0:39:47
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

2 similar comments
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-73 on Linux RHEL 9

  • Commit ID: 0b7a802
  • Duration 0:39:47
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-73 on Linux RHEL 9

  • Commit ID: 0b7a802
  • Duration 0:39:47
  • Result: ❌ FAILED
  • Error: Error while executing command: ninja -v -C build_output -j ${NPROC} all packages strip_targets. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

michael stack added 4 commits April 14, 2026 17:40
Avoid creating temporary FutureStream objects in the drain loop.
getFuture() adds/removes ref counts on each call which may interact
poorly with the actor compiler.
Using getFuture().pop() on temporary FutureStream objects caused a hang.
Store the FutureStream once as a state variable and use it for both
waitNext in the choose block and isReady()/pop() in the drain loop.
Each noErrorActors.add(tag(delay(0)...)) in the drain loop schedules
an immediate task. With many completions drained at once, this floods
the task queue and causes the event loop to spend all time on system
monitor callbacks (getResidentMemoryUsage) instead of making progress.

Keep the delay(0) for the first completion (original behavior) but skip
it for batch-drained completions. The key cleanup (fetchKeysComplete
erase, activeRelocations decrement) still happens immediately.
Use post-increment (drained++ < 1000) to drain up to 1000 completions
instead of 999.
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-73 on Linux RHEL 9

  • Commit ID: 3fe923d
  • Duration 0:55:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

2 similar comments
@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-73 on Linux RHEL 9

  • Commit ID: 3fe923d
  • Duration 0:55:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Copy Markdown
Contributor

Result of foundationdb-pr-73 on Linux RHEL 9

  • Commit ID: 3fe923d
  • Duration 0:55:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants