Skip to content

refactor(uffd): drain pagefault events into a batch per poll cycle#2445

Draft
ValentaTomas wants to merge 1 commit intomainfrom
refactor/uffd-batch-read-events
Draft

refactor(uffd): drain pagefault events into a batch per poll cycle#2445
ValentaTomas wants to merge 1 commit intomainfrom
refactor/uffd-batch-read-events

Conversation

@ValentaTomas
Copy link
Copy Markdown
Member

@ValentaTomas ValentaTomas commented Apr 18, 2026

Extracted from #1896. Pure structural refactor of the UFFD event loop, no behaviour change.

Userfaultfd.Serve previously read exactly one event per poll/read cycle. This drains all queued events into a slice (until EAGAIN) and processes them as a batch in the same iteration.

Why: the upcoming UFFD_EVENT_REMOVE handling needs REMOVE and PAGEFAULT events from the same poll cycle delivered together, so REMOVEs can be applied to the page tracker before the matching faults run.

Same EINTR/EAGAIN handling, same WRITE/read branching, same wg.Go dispatch, same faultPage signature, same eagain/noData counters. The only structural addition is the outer for _, pagefault := range pagefaults loop, which accounts for most of the diff (indentation churn).

Note: the arm64-tests / packages/shared failure is the unrelated TestChangeResponseHeader flake fixed by #2443; this PR doesn't touch packages/shared.

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 18, 2026

PR Summary

Medium Risk
Refactors the Userfaultfd.Serve event loop in a low-level memory fault handler; while intended as behavior-preserving, subtle differences in EAGAIN/EINTR handling or batch ordering could affect page-fault processing under load.

Overview
Userfaultfd.Serve now drains all queued UFFD_EVENT_PAGEFAULT messages into a batch (reading until non-blocking EAGAIN) and then processes that batch in one poll cycle, instead of handling only a single event per poll/read iteration. Counter logging and error handling were adjusted accordingly, and the unexpected-flag comment was clarified to document why synchronous WP/MINOR faults aren’t expected.

Reviewed by Cursor Bugbot for commit a1a4dae. Bugbot is set up for automated code reviews on this repo. Configure here.

@ValentaTomas ValentaTomas force-pushed the refactor/uffd-batch-read-events branch from ce956f4 to 0c877e1 Compare April 18, 2026 21:58
@ValentaTomas ValentaTomas changed the base branch from feat/firecracker-v1.14 to main April 18, 2026 21:58
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce956f4c94

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +133 to +134
if n == 0 {
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return EOF from readEvents instead of silently breaking

In readEvents, n == 0 is handled as a normal loop termination, but for userfaultfd this indicates EOF/closed descriptor (the userfaultfd(2) example treats this as a fatal condition). Returning an empty batch here makes Serve continue its poll loop without surfacing closure, which can leave the handler stuck (or spinning) after the uffd is closed during teardown/failure paths. This should be propagated as an explicit error or shutdown signal.

Useful? React with 👍 / 👎.


msg := (*UffdMsg)(unsafe.Pointer(&buf[0]))

if event := getMsgEvent(msg); event != UFFD_EVENT_PAGEFAULT {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a non-PAGEFAULT event is encountered mid-batch, return nil, ErrUnexpectedEventType silently discards any pagefaults already accumulated in pagefaults. Once UFFD_EVENT_REMOVE is registered in the follow-up PR, a batch like [PF1, PF2, REMOVE, PF3] will cause readEvents to drop PF1 and PF2 before the caller ever sees them — those guest memory accesses will never be resolved, hanging the VM.

Consider returning the partial slice alongside the error: return pagefaults, ErrUnexpectedEventType, and let Serve decide whether to flush what was collected before propagating the error.

Comment on lines +130 to +137
// Read returned 0 bytes, meaning the writing end has been closed.
// This should not happen in practice unless something (us or
// Firecracker) has closed the fd.
if n == 0 {
break
}

msg := (*UffdMsg)(unsafe.Pointer(&buf[0]))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In readEvents(), when syscall.Read returns n=0 (EOF / fd closed), the function silently breaks and returns (nil, nil) instead of propagating an error. Back in Serve(), unix.Poll() on a closed uffd fd keeps returning POLLIN (and POLLHUP), but POLLHUP only increments a counter and never causes Serve() to return—resulting in a 100% CPU spin loop with no diagnostic output and no clean shutdown. Fix by returning a sentinel error (e.g. io.EOF or a named ErrFdClosed) when n==0 so Serve() can exit cleanly.

Extended reasoning...

The bug: silent swallowing of EOF in readEvents().

When the userfaultfd file descriptor is closed (e.g. because Firecracker exits or crashes), the kernel raises POLLHUP on the fd. The POLLIN bit may also be set depending on kernel version and whether the waitqueues are still active. In Serve(), the POLLHUP flag is handled only by incrementing uffdErrorCounter—it never causes Serve() to return or break the loop. If POLLIN is not set, the !hasEvent check fires a continue, spinning back to poll() immediately; poll() returns POLLHUP again; infinite loop at 100% CPU.

If POLLIN is set alongside POLLHUP, readEvents() is called. Inside readEvents(), syscall.Read returns n=0, err=nil (the standard EOF behavior for a closed fd). The code reaches the if n == 0 { break } guard, breaks out of the drain loop, and returns (pagefaults, nil)—an empty slice with no error.

Why the existing code does not prevent the spin.

Serve() receives (nil, nil) from readEvents(). The for _, pf := range pagefaults loop is a no-op on an empty slice. Control falls through to the end of the outer for loop and jumps back to unix.Poll(). Since the fd closure is permanent, poll() immediately returns POLLIN (or POLLHUP) again on the next call. This repeats indefinitely, consuming a full CPU core with no log output and no way to observe that anything is wrong.

Step-by-step proof.

  1. Firecracker exits; the kernel closes its handle on the uffd fd.
  2. unix.Poll(pollFds, -1) returns immediately with uffdFd.Revents = POLLIN | POLLHUP.
  3. uffdErrorCounter.Increase(POLLHUP) is called; Serve() does not return.
  4. hasEvent(uffdFd.Revents, unix.POLLIN) is true → readEvents(ctx) is called.
  5. syscall.Read returns n=0, err=nil → the if n == 0 { break } fires → readEvents returns (nil, nil).
  6. for _, pf := range nil is a no-op → outer loop falls through to step 2.
  7. Steps 2–6 repeat without pause at 100% CPU.

The only external escape would be fdExit.Reader() independently signaling POLLIN on pollFds[1], but if Firecracker crashed without the orchestrator's process monitor firing Stop(), that signal never arrives.

How to fix.

Return a sentinel error from readEvents() when n==0:

Serve() already returns on any non-nil error from readEvents(), so this single change is sufficient to break the spin. Optionally add a u.logger.Warn before returning so operators can observe the event in logs.

@ValentaTomas ValentaTomas marked this pull request as draft April 18, 2026 22:16
@ValentaTomas ValentaTomas force-pushed the refactor/uffd-batch-read-events branch from 0c877e1 to 4c356cc Compare April 18, 2026 22:19
@ValentaTomas ValentaTomas changed the title refactor(uffd): drain pagefault events into a batch via readEvents() refactor(uffd): drain pagefault events into a batch per poll cycle Apr 18, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4c356cc. Configure here.

// There is no error so we can proceed.

eagainCounter.Log(ctx)
noDataCounter.Log(ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eagainCounter now always outputs constant value of 1

Low Severity

eagainCounter lost its diagnostic value in this refactor. Every drain loop now terminates with exactly one EAGAIN (the normal "queue empty" signal), so Increase is called once, then Log immediately prints and resets. The counter always outputs count: 1 on every poll cycle, producing constant debug-log noise with zero information. The PR description itself notes this counter "no longer makes sense" and states it is dropped, but it was retained.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4c356cc. Configure here.

@ValentaTomas ValentaTomas force-pushed the refactor/uffd-batch-read-events branch from 4c356cc to d76e1ff Compare April 18, 2026 22:52
Instead of reading exactly one UFFD event per poll/read cycle, drain all
queued events into a slice (until EAGAIN) and process them as a batch.

Structural prerequisite for upcoming UFFD_EVENT_REMOVE handling: REMOVE and
PAGEFAULT events for the same address need to be delivered from the same
poll cycle so REMOVEs can be applied to the page tracker before the
matching faults run.

No behaviour change for the current PAGEFAULT-only registration: the same
EINTR/EAGAIN handling, the same WRITE/read branching, the same `wg.Go`
dispatch, the same `faultPage` signature, and the same eagain/noData
counters.
@ValentaTomas ValentaTomas force-pushed the refactor/uffd-batch-read-events branch from d76e1ff to a1a4dae Compare April 18, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants