Skip to content

feat(orch): re-enable memory.peak per-FD reset (cgroups v2)#2430

Draft
arkamar wants to merge 3 commits intomainfrom
feat/memory-peak-per-fd-reset
Draft

feat(orch): re-enable memory.peak per-FD reset (cgroups v2)#2430
arkamar wants to merge 3 commits intomainfrom
feat/memory-peak-per-fd-reset

Conversation

@arkamar
Copy link
Copy Markdown
Contributor

@arkamar arkamar commented Apr 17, 2026

Now that we run on kernel 6.12+ (ubuntu24), re-enable per-FD peak reset for memory.peak. Each GetStats() call reads the interval peak and writes "0" to reset it, giving per-sample granularity instead of lifetime-only tracking.

Restores: 8cf9d74 ("fix: disable memory.peak per-FD reset (requires kernel 6.12+) (#1959)")

Now that we run on kernel 6.12+ (ubuntu24), re-enable per-FD peak
reset for memory.peak. Each GetStats() call reads the interval peak
and writes "0" to reset it, giving per-sample granularity instead
of lifetime-only tracking.

Restores: 8cf9d74 ("fix: disable memory.peak per-FD reset (requires kernel 6.12+) (#1959)")
@cursor
Copy link
Copy Markdown

cursor bot commented Apr 17, 2026

PR Summary

Medium Risk
Changes low-level cgroup stats collection to write-reset memory.peak on each sample, which can affect resource accounting accuracy and may behave differently across kernel/config variations.

Overview
This PR switches cgroups v2 memory.peak tracking from a lifetime peak to an interval peak by keeping a writable memory.peak FD open, reading it during GetStats(), and writing to it to reset the counter for the next sample. It also tightens parsing/error handling, adjusts logging to surface when reset isn’t available, updates the host stats comment to reflect interval semantics, and revises the cgroup integration test to validate per-sample peak behavior rather than monotonic lifetime growth.

Reviewed by Cursor Bugbot for commit 34d1e0d. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit f615ca8. Configure here.

Comment thread packages/orchestrator/pkg/sandbox/cgroup/manager_test.go
Comment thread packages/orchestrator/pkg/sandbox/cgroup/manager.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/cgroup/manager.go Outdated
Comment thread packages/orchestrator/pkg/sandbox/cgroup/manager.go
arkamar added 2 commits April 18, 2026 09:08
When the O_RDWR open fails, memoryPeakFile is nil and peak tracking
is skipped entirely (MemoryPeakBytes stays 0). The old message
claimed "will track lifetime peak", which was false. Corrected the
message and promoted it from Debug to Warn, since this silently
disables peak tracking and should be visible in production logs.
If ParseUint fails, the function was returning (0, nil) — a bogus
zero peak that looks valid to the caller — and then resetting the
kernel peak counter, permanently losing the real data with no error
signal. Now return an error instead, so the caller knows the read
failed and the kernel peak is preserved for the next attempt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants