refactor(uffd): drain pagefault events into a batch per poll cycle#2445
refactor(uffd): drain pagefault events into a batch per poll cycle#2445ValentaTomas wants to merge 1 commit intomainfrom
Conversation
PR SummaryMedium Risk Overview Reviewed by Cursor Bugbot for commit a1a4dae. Bugbot is set up for automated code reviews on this repo. Configure here. |
ce956f4 to
0c877e1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce956f4c94
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if n == 0 { | ||
| break |
There was a problem hiding this comment.
Return EOF from readEvents instead of silently breaking
In readEvents, n == 0 is handled as a normal loop termination, but for userfaultfd this indicates EOF/closed descriptor (the userfaultfd(2) example treats this as a fatal condition). Returning an empty batch here makes Serve continue its poll loop without surfacing closure, which can leave the handler stuck (or spinning) after the uffd is closed during teardown/failure paths. This should be propagated as an explicit error or shutdown signal.
Useful? React with 👍 / 👎.
|
|
||
| msg := (*UffdMsg)(unsafe.Pointer(&buf[0])) | ||
|
|
||
| if event := getMsgEvent(msg); event != UFFD_EVENT_PAGEFAULT { |
There was a problem hiding this comment.
When a non-PAGEFAULT event is encountered mid-batch, return nil, ErrUnexpectedEventType silently discards any pagefaults already accumulated in pagefaults. Once UFFD_EVENT_REMOVE is registered in the follow-up PR, a batch like [PF1, PF2, REMOVE, PF3] will cause readEvents to drop PF1 and PF2 before the caller ever sees them — those guest memory accesses will never be resolved, hanging the VM.
Consider returning the partial slice alongside the error: return pagefaults, ErrUnexpectedEventType, and let Serve decide whether to flush what was collected before propagating the error.
| // Read returned 0 bytes, meaning the writing end has been closed. | ||
| // This should not happen in practice unless something (us or | ||
| // Firecracker) has closed the fd. | ||
| if n == 0 { | ||
| break | ||
| } | ||
|
|
||
| msg := (*UffdMsg)(unsafe.Pointer(&buf[0])) |
There was a problem hiding this comment.
🔴 In readEvents(), when syscall.Read returns n=0 (EOF / fd closed), the function silently breaks and returns (nil, nil) instead of propagating an error. Back in Serve(), unix.Poll() on a closed uffd fd keeps returning POLLIN (and POLLHUP), but POLLHUP only increments a counter and never causes Serve() to return—resulting in a 100% CPU spin loop with no diagnostic output and no clean shutdown. Fix by returning a sentinel error (e.g. io.EOF or a named ErrFdClosed) when n==0 so Serve() can exit cleanly.
Extended reasoning...
The bug: silent swallowing of EOF in readEvents().
When the userfaultfd file descriptor is closed (e.g. because Firecracker exits or crashes), the kernel raises POLLHUP on the fd. The POLLIN bit may also be set depending on kernel version and whether the waitqueues are still active. In Serve(), the POLLHUP flag is handled only by incrementing uffdErrorCounter—it never causes Serve() to return or break the loop. If POLLIN is not set, the !hasEvent check fires a continue, spinning back to poll() immediately; poll() returns POLLHUP again; infinite loop at 100% CPU.
If POLLIN is set alongside POLLHUP, readEvents() is called. Inside readEvents(), syscall.Read returns n=0, err=nil (the standard EOF behavior for a closed fd). The code reaches the if n == 0 { break } guard, breaks out of the drain loop, and returns (pagefaults, nil)—an empty slice with no error.
Why the existing code does not prevent the spin.
Serve() receives (nil, nil) from readEvents(). The for _, pf := range pagefaults loop is a no-op on an empty slice. Control falls through to the end of the outer for loop and jumps back to unix.Poll(). Since the fd closure is permanent, poll() immediately returns POLLIN (or POLLHUP) again on the next call. This repeats indefinitely, consuming a full CPU core with no log output and no way to observe that anything is wrong.
Step-by-step proof.
- Firecracker exits; the kernel closes its handle on the uffd fd.
- unix.Poll(pollFds, -1) returns immediately with uffdFd.Revents = POLLIN | POLLHUP.
- uffdErrorCounter.Increase(POLLHUP) is called; Serve() does not return.
- hasEvent(uffdFd.Revents, unix.POLLIN) is true → readEvents(ctx) is called.
- syscall.Read returns n=0, err=nil → the if n == 0 { break } fires → readEvents returns (nil, nil).
- for _, pf := range nil is a no-op → outer loop falls through to step 2.
- Steps 2–6 repeat without pause at 100% CPU.
The only external escape would be fdExit.Reader() independently signaling POLLIN on pollFds[1], but if Firecracker crashed without the orchestrator's process monitor firing Stop(), that signal never arrives.
How to fix.
Return a sentinel error from readEvents() when n==0:
Serve() already returns on any non-nil error from readEvents(), so this single change is sufficient to break the spin. Optionally add a u.logger.Warn before returning so operators can observe the event in logs.
0c877e1 to
4c356cc
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4c356cc. Configure here.
| // There is no error so we can proceed. | ||
|
|
||
| eagainCounter.Log(ctx) | ||
| noDataCounter.Log(ctx) |
There was a problem hiding this comment.
eagainCounter now always outputs constant value of 1
Low Severity
eagainCounter lost its diagnostic value in this refactor. Every drain loop now terminates with exactly one EAGAIN (the normal "queue empty" signal), so Increase is called once, then Log immediately prints and resets. The counter always outputs count: 1 on every poll cycle, producing constant debug-log noise with zero information. The PR description itself notes this counter "no longer makes sense" and states it is dropped, but it was retained.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 4c356cc. Configure here.
4c356cc to
d76e1ff
Compare
Instead of reading exactly one UFFD event per poll/read cycle, drain all queued events into a slice (until EAGAIN) and process them as a batch. Structural prerequisite for upcoming UFFD_EVENT_REMOVE handling: REMOVE and PAGEFAULT events for the same address need to be delivered from the same poll cycle so REMOVEs can be applied to the page tracker before the matching faults run. No behaviour change for the current PAGEFAULT-only registration: the same EINTR/EAGAIN handling, the same WRITE/read branching, the same `wg.Go` dispatch, the same `faultPage` signature, and the same eagain/noData counters.
d76e1ff to
a1a4dae
Compare


Extracted from #1896. Pure structural refactor of the UFFD event loop, no behaviour change.
Userfaultfd.Servepreviously read exactly one event per poll/read cycle. This drains all queued events into a slice (until EAGAIN) and processes them as a batch in the same iteration.Why: the upcoming
UFFD_EVENT_REMOVEhandling needs REMOVE and PAGEFAULT events from the same poll cycle delivered together, so REMOVEs can be applied to the page tracker before the matching faults run.Same EINTR/EAGAIN handling, same WRITE/read branching, same
wg.Godispatch, samefaultPagesignature, same eagain/noData counters. The only structural addition is the outerfor _, pagefault := range pagefaultsloop, which accounts for most of the diff (indentation churn).Note: the
arm64-tests / packages/sharedfailure is the unrelatedTestChangeResponseHeaderflake fixed by #2443; this PR doesn't touchpackages/shared.