perf(sandbox): reduce memory dirtying from journald and envd logging by ValentaTomas · Pull Request #2423 · e2b-dev/infra

ValentaTomas · 2026-04-17T01:44:49Z

Summary

Add journald Storage=none drop-in inside guest VMs to prevent journal writes from dirtying memory pages during pause-resume cycles
Remove stdout writer from envd in FC mode — debug logs still ship via HTTP exporter to the orchestrator, but never touch journald/disk inside the VM
Add measure-memory-dirtying.sh benchmark script for measuring diff sizes across pause-resume cycles

Motivation

During frequent pause-resume cycles, journald and envd stdout were constantly writing to memory, dirtying pages that then had to be captured in every snapshot diff layer. This increased layer sizes and slowed snapshot restore.

Changes

journald.conf.tpl (new): Storage=none, MaxLevelConsole=warning, MaxLevelKMsg=warning — system errors still reach the kernel console, but nothing is stored in journal files
logger.go: In FC mode, envd writes only to the HTTP exporter (debug level preserved). In non-FC mode, stdout + debug unchanged
envd version: Bumped to 0.5.15
rootfs_test.go: Updated file count and added assertions for journald conf
measure-memory-dirtying.sh: Benchmark script that chains build → pause-resume cycles and prints memfile/rootfs diff sizes

Test plan

Unit test passes (TestAdditionalOCILayers)
Run benchmark on staging machine with GCS access to measure before/after diff sizes
Verify envd debug logs still arrive at orchestrator via HTTP exporter

- Add journald.conf drop-in with Storage=none to prevent journal writes from dirtying guest memory pages during pause-resume cycles - Remove stdout writer from envd in FC mode — debug logs ship via HTTP exporter only, never hitting journald/disk inside the VM - Add measure-memory-dirtying.sh script to benchmark diff sizes across pause-resume cycles with the orchestrator CLI tools - Bump envd version to 0.5.15

cursor · 2026-04-17T01:44:56Z

PR Summary

Medium Risk
Changes guest logging behavior and journald configuration, which could reduce local observability or mask early-boot/logging failures if the HTTP exporter is unavailable. The rest is additive/testing-only, but validating log delivery in FC mode is important.

Overview
Reduces pause/resume snapshot diff sizes by disabling persistent journald storage in the guest (via a journald drop-in) and by stopping envd from writing logs to stdout in Firecracker mode so debug logs only go through the HTTP exporter; it also adds a helper script to benchmark dirtying across repeated pause/resume cycles, bumps the envd version, and updates rootfs tests to assert the new journald config is included.

^{Reviewed by Cursor Bugbot for commit 0576ecd. Bugbot is set up for automated code reviews on this repo. Configure here.}

claude · 2026-04-17T01:48:59Z

+		// HTTP exporter only — stdout goes to journald which dirties guest memory pages.
+		exporters = append(exporters, exporter.NewHTTPLogsExporter(ctx, isNotFC, mmdsChan))
 	}
+	level = zerolog.DebugLevel


var level zerolog.Level initializes to 0, which equals zerolog.DebugLevel. The unconditional level = zerolog.DebugLevel outside the if-else is dead code — it reassigns the same zero value. If the intent is per-branch level selection in the future, the assignment belongs inside each branch; as written it just adds noise.

claude · 2026-04-17T01:49:07Z

+#
+# Requires: root, KVM, Docker, NBD, hugepages
+set -euo pipefail
+


The paths ./packages/orchestrator/cmd/create-build and ./packages/orchestrator/cmd/resume-build are relative to the repo root, but the usage comment says sudo ./measure-memory-dirtying.sh — implying the caller is in the script's own directory, where those paths do not exist. The script should either document that it must be run from the repo root, or resolve it dynamically.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0576ecd549

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T01:49:24Z

+CREATE_BUILD="go run ./packages/orchestrator/cmd/create-build"
+RESUME_BUILD="go run ./packages/orchestrator/cmd/resume-build"


Resolve go-run paths from the script directory

The script’s documented usage (sudo ./measure-memory-dirtying.sh) implies running it from packages/orchestrator/cmd/resume-build, but these hard-coded commands use repo-root-relative paths, so invocation from the script directory resolves to non-existent paths and the benchmark fails before step 1 starts. This makes the new benchmark unusable unless callers happen to run it from the repository root.

Useful? React with 👍 / 👎.

claude

The logger.go change removes stdout from envd in FC mode — worth a human verifying that the HTTP exporter reliably captures all debug logs before the stdout path is dropped entirely.

Extended reasoning...

Overview

This PR touches five files: a new journald drop-in config template, a logger change removing stdout from envd in FC (Firecracker) mode, a version bump, a test update, and a new developer benchmark script.

Security risks

No security-sensitive code paths are touched. The changes are scoped to logging configuration inside guest VMs and a developer utility script.

Level of scrutiny

The journald config, version bump, test update, and benchmark script are straightforward. The production-relevant change is in logger.go: in FC mode, envd previously wrote to both the HTTP exporter and stdout; now it writes only to the HTTP exporter. stdout inside a guest VM goes through journald, so removing it is the correct mechanism to prevent memory dirtying — but it does reduce log visibility if the HTTP exporter path ever fails silently. The test plan item for verifying HTTP exporter delivery is still unchecked.

Other factors

The two bugs found are nits: incomplete test assertions for the journald config (missing [Journal] header check) and a misleading Usage comment in the benchmark script. Neither is blocking. The overall approach is sound and well-motivated.

claude · 2026-04-17T01:54:22Z

 		assert.Equal(t, disabledContent, actualFiles["etc/systemd/system/systemd-networkd.service.d/override.conf"])
+
+		journaldConf := strings.TrimSpace(actualFiles["etc/systemd/journald.conf.d/e2b.conf"])
+		assert.Contains(t, journaldConf, "Storage=none")
+		assert.Contains(t, journaldConf, "MaxLevelConsole=warning")
 	})


🟡 The journald conf test only checks Storage=none and MaxLevelConsole=warning, omitting the [Journal] section header and ForwardToSyslog=no. Without the [Journal] header, systemd silently ignores the entire drop-in file — a regression that accidentally removes it would make Storage=none a complete no-op in production while the test still passes. Consider using assert.Equal to match the full config content, as done for disabledContent elsewhere in this test.

Extended reasoning...

The new test assertions cover only two of the five settings produced by journald.conf.tpl: Storage=none and MaxLevelConsole=warning. Three settings go unchecked: [Journal] section header, ForwardToSyslog=no, and MaxLevelKMsg=warning.

The [Journal] section header is the most critical omission. systemd's drop-in parser uses an INI-style format where every directive must appear under a named section header. A .conf file that contains only bare key=value lines with no [Journal] header will be accepted by the parser without error but all its directives will be silently ignored. This means that if a future refactor accidentally removed the [Journal] line from journald.conf.tpl, the entire optimization (Storage=none, ForwardToSyslog=no, etc.) would become a no-op in production — journald would continue writing to disk and dirtying guest memory pages during every pause-resume cycle.

The current template (journald.conf.tpl) does correctly include [Journal] on line 4 and ForwardToSyslog=no on line 8, so production behaviour is correct today. This is purely a test coverage weakness: the assertions are not strict enough to catch a regression that removes the section header.

Step-by-step proof of the regression scenario:

A future change accidentally removes [Journal] from journald.conf.tpl, leaving only bare key=value lines.

The template still renders a file containing Storage=none and MaxLevelConsole=warning as literal strings.

The test calls assert.Contains(t, journaldConf, "Storage=none") — this passes, because the string is present.

The test calls assert.Contains(t, journaldConf, "MaxLevelConsole=warning") — this also passes.

The test suite is green. But the drop-in file has no [Journal] section header.

Inside the guest VM, systemd reads the file and discards all directives because there is no enclosing section. journald falls back to its default Storage=auto behaviour and continues writing to the journal, dirtying memory pages on every pause-resume cycle.

The fix is straightforward: either add assert.Contains(t, journaldConf, "[Journal]") (and for ForwardToSyslog=no), or use assert.Equal to match the full expected content — which is the pattern already established by the disabledContent check immediately above these lines.

claude · 2026-04-17T01:54:22Z

+STORAGE="${1:-.local-build}"
+CREATE_BUILD="go run ./packages/orchestrator/cmd/create-build"
+RESUME_BUILD="go run ./packages/orchestrator/cmd/resume-build"


🟡 The go run paths in measure-memory-dirtying.sh are relative to CWD and only work when invoked from the repo root, but the Usage comment (sudo ./measure-memory-dirtying.sh) implies running from the script's own directory where those paths don't exist. Either update the Usage comment to show repo-root invocation (e.g. sudo ./packages/orchestrator/cmd/resume-build/measure-memory-dirtying.sh) or add a cd $(dirname "$0")/../../.. at the top to make the script location-independent.

Extended reasoning...

The script at packages/orchestrator/cmd/resume-build/measure-memory-dirtying.sh defines:

CREATE_BUILD="go run ./packages/orchestrator/cmd/create-build" RESUME_BUILD="go run ./packages/orchestrator/cmd/resume-build"

These paths are relative to the current working directory at the time the script is invoked — not relative to the script file itself. The Usage comment says:

sudo ./measure-memory-dirtying.sh [storage-path]

The ./ prefix in ./measure-memory-dirtying.sh conventionally implies the user is already in the script's directory (packages/orchestrator/cmd/resume-build/). If a developer follows that convention literally:

Step-by-step proof of failure:

Developer runs: cd packages/orchestrator/cmd/resume-build && sudo ./measure-memory-dirtying.sh

CWD is now packages/orchestrator/cmd/resume-build/

CREATE_BUILD expands to: go run ./packages/orchestrator/cmd/create-build

Go looks for packages/orchestrator/cmd/resume-build/packages/orchestrator/cmd/create-build — which does not exist

go run fails with a "no Go files" or "cannot find package" error

The RESUME_BUILD variable is equally broken under this scenario: go run ./packages/orchestrator/cmd/resume-build would resolve to packages/orchestrator/cmd/resume-build/packages/orchestrator/cmd/resume-build, which also does not exist.

The script only works correctly when invoked from the repo root (e.g. sudo ./packages/orchestrator/cmd/resume-build/measure-memory-dirtying.sh), but the Usage comment does not document this requirement.

The fix is either: (a) update the Usage comment to require repo-root invocation, or (b) add cd "$(dirname "$0")/../../.." && ... at the top of the script to resolve all paths relative to the repo root regardless of where the caller is located. Option (b) is more robust for a developer utility.

This is a developer-facing benchmark script, not production code, so the severity is nit — it will cause an immediately obvious failure rather than silent data corruption.

ValentaTomas requested review from dobrac and jakubno as code owners April 17, 2026 01:44

e2b-request-same-site-reviewers bot assigned djeebus Apr 17, 2026

ValentaTomas marked this pull request as draft April 17, 2026 01:45

claude bot reviewed Apr 17, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

claude bot reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sandbox): reduce memory dirtying from journald and envd logging#2423

perf(sandbox): reduce memory dirtying from journald and envd logging#2423
ValentaTomas wants to merge 1 commit intomainfrom
perf/reduce-sandbox-memory-dirtying

ValentaTomas commented Apr 17, 2026

Uh oh!

cursor bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

claude bot Apr 17, 2026

Uh oh!

claude bot Apr 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Apr 17, 2026

Uh oh!

claude bot Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		CREATE_BUILD="go run ./packages/orchestrator/cmd/create-build"
		RESUME_BUILD="go run ./packages/orchestrator/cmd/resume-build"

Conversation

ValentaTomas commented Apr 17, 2026

Summary

Motivation

Changes

Test plan

Uh oh!

cursor bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

claude bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cursor bot commented Apr 17, 2026 •

edited

Loading