fix(JSONL): honor start/end offsets for string input in parseChunk by Jarred-Sumner · Pull Request #28025 · oven-sh/bun

Jarred-Sumner · 2026-03-12T00:03:45Z

What

Bun.JSONL.parseChunk(input, start, end) now honors start/end when input is a string. Previously these offsets only worked for typed arrays and were silently ignored for strings.

Why

This broke error-recovery loops that call parseChunk(data, offset) to skip past a malformed line — with string input the offset was ignored, so the same bad position was re-parsed forever, causing an infinite loop with O(n²) heap growth from repeated concat calls.

How

Extracted the start/end clamping logic into a shared lambda parseOffsets(length). This also fixes a latent bug where the code referenced length before it was defined.
String path: call parseOffsets(view->length()), use view->substring(start, end - start) when offsets differ from defaults, return read = start + charactersConsumed so the offset is absolute into the original string (UTF-16 code-unit positions).
Collapsed the two .d.ts overloads into one; JSDoc now documents byte-vs-character offset semantics.

Tests

String offset tests mirroring the existing typed-array offset suite
Error-recovery loop regression test (terminates in ≤4 iterations instead of spinning forever)
test.each suites for NaN, Infinity, -Infinity, negative, fractional, Number.MAX_SAFE_INTEGER offsets on both string and Buffer — all clamp correctly without crashing

Previously start/end were silently ignored for string input and only worked for typed arrays. This broke error-recovery loops that pass an offset to skip past a bad line — the offset was ignored so the same bad position was re-parsed forever. - Extract start/end clamping into a shared lambda that takes length; was previously referencing length before it was defined - For strings: pass view->length() (UTF-16 code-unit count), use view->substring(start, end - start) to slice, and return read = start + charactersConsumed so offsets are absolute - Collapse the two .d.ts overloads; document byte-vs-char semantics - Add tests for string offsets, an error-recovery loop regression test, and NaN/Infinity/-Infinity/negative/fractional/huge offset handling on both string and Buffer paths

robobun · 2026-03-12T00:03:55Z

^{Updated 5:48 PM PT - Mar 11th, 2026}

❌ @Jarred-Sumner, your commit b20fe73 has 2 failures in Build #39333 (All Failures):

test/bundler/compile-windows-metadata.test.ts - code 1 on 🪟 2019 x64-baseline
test/integration/next-pages/test/dev-server.test.ts - code 1 on 🍎 13 aarch64

🧪 To try this PR locally:

bunx bun-pr 28025

That installs a local version of the PR into your bun-28025 executable, so you can run:

bun-28025 --bun

coderabbitai · 2026-03-12T00:09:20Z

Walkthrough

The JSONL parseChunk API was expanded to accept mixed input types (strings, TypedArrays, DataView, ArrayBufferLike) with optional start and end offset parameters. Implementation was updated to handle byte offsets for typed arrays and character offsets for strings, with comprehensive test coverage added.

Changes

Cohort / File(s)	Summary
API Type Definitions `packages/bun-types/bun.d.ts`	Widened parseChunk overload to accept string \| NodeJS.TypedArray \| DataView \| ArrayBufferLike with optional start/end offset parameters for byte/character windowing.
Native Implementation `src/bun.js/bindings/BunObject.cpp`	Added parseOffsets helper and logic to handle start/end offsets for both TypedArray (byte offsets) and String (character offsets) inputs, including ArrayBuffer validation and readBytes calculation adjustments.
Test Coverage `test/js/bun/jsonl/jsonl-parse.test.ts`	Added comprehensive tests for offset handling including edge cases (NaN, Infinity, negative/fractional values), consistent behavior across string and Uint8Array paths, and BOM/UTF-8 boundary interactions.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing JSONL parseChunk to honor start/end offsets for string input, which is the core issue addressed in this PR.
Description check	✅ Passed	The description comprehensively covers both required sections: 'What' explains the fix clearly, and 'How' provides implementation details. A 'Why' section is also included explaining the motivation and impact.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/bun/jsonl/jsonl-parse.test.ts`:
- Around line 738-786: Add a new test exercising Bun.JSONL.parseChunk with a
start offset that skips a prefix containing a non-BMP (astral) character so we
verify offsets are measured in UTF-16 code units, not bytes/code points; e.g.
mirror the "start offset works for string input (character offsets)" test but
use a prefix like a single astral emoji (U+1F600 or any 2-unit UTF-16 char)
before the first JSONL line, call Bun.JSONL.parseChunk with the correct
character offset (accounting for the surrogate pair), and assert the parsed
values and read position match the expected results (use the same helpers/result
shape as the existing tests).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bea1d441-21c6-43cb-8b58-892a5db569da

📥 Commits

Reviewing files that changed from the base of the PR and between 9d51af8 and b20fe73.

📒 Files selected for processing (3)

packages/bun-types/bun.d.ts
src/bun.js/bindings/BunObject.cpp
test/js/bun/jsonl/jsonl-parse.test.ts

coderabbitai · 2026-03-12T00:09:23Z

test/js/bun/jsonl/jsonl-parse.test.ts

+      test("start offset works for string input (character offsets)", () => {
        const result = Bun.JSONL.parseChunk('{"a":1}\n{"b":2}\n', 8);
+        expect(result.values).toStrictEqual([{ b: 2 }]);
+        expect(result.read).toBe(15);
+      });
+
+      test("end offset works for string input", () => {
+        const input = '{"a":1}\n{"b":2}\n{"c":3}\n';
+        const result = Bun.JSONL.parseChunk(input, 0, 16);
        expect(result.values).toStrictEqual([{ a: 1 }, { b: 2 }]);
      });

+      test("start+end window works for string input", () => {
+        const input = '{"a":1}\n{"b":2}\n{"c":3}\n';
+        const result = Bun.JSONL.parseChunk(input, 8, 16);
+        expect(result.values).toStrictEqual([{ b: 2 }]);
+        expect(result.read).toBe(15);
+      });
+
+      test("string: start equals end returns empty", () => {
+        const result = Bun.JSONL.parseChunk('{"a":1}\n', 5, 5);
+        expect(result.values).toStrictEqual([]);
+        expect(result.read).toBe(5);
+        expect(result.done).toBe(true);
+      });
+
+      test("string: error-recovery loop with start offset terminates", () => {
+        // Regression: previously start was ignored for strings, so an
+        // error-recovery loop that advanced the offset would re-parse the
+        // same position forever. Now offsets are honored and each iteration
+        // makes progress.
+        const input = '{"a":1}\n{bad\n{"c":3}\n';
+        const collected: unknown[] = [];
+        let offset = 0;
+        let iterations = 0;
+        while (offset < input.length) {
+          iterations++;
+          if (iterations > 10) throw new Error("infinite loop");
+          const r = Bun.JSONL.parseChunk(input, offset);
+          collected.push(...r.values);
+          if (!r.error || r.done || r.read >= input.length) break;
+          const nl = input.indexOf("\n", r.read);
+          if (nl === -1) break;
+          offset = nl + 1;
+          if (offset <= r.read) offset = r.read + 1; // guarantee progress
+        }
+        expect(collected).toStrictEqual([{ a: 1 }, { c: 3 }]);
+        expect(iterations).toBeLessThanOrEqual(4);
+      });


⚠️ Potential issue | 🟡 Minor

Add one non-BMP string-offset case.

These new string-offset assertions are all ASCII. Since the implementation measures string offsets in UTF-16 code units, please add a case where the skipped prefix contains an astral character so this path can't silently regress to byte/code-point indexing later.

Suggested coverage addition

+ test("string offsets use UTF-16 code units for astral chars", () => { + const input = '{"e":"🎉"}\n{"a":1}\n'; + const start = '{"e":"🎉"}\n'.length; + const result = Bun.JSONL.parseChunk(input, start); + expect(result.values).toStrictEqual([{ a: 1 }]); + expect(result.read).toBe('{"e":"🎉"}\n{"a":1}'.length); + });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/js/bun/jsonl/jsonl-parse.test.ts` around lines 738 - 786, Add a new test exercising Bun.JSONL.parseChunk with a start offset that skips a prefix containing a non-BMP (astral) character so we verify offsets are measured in UTF-16 code units, not bytes/code points; e.g. mirror the "start offset works for string input (character offsets)" test but use a prefix like a single astral emoji (U+1F600 or any 2-unit UTF-16 char) before the first JSONL line, call Bun.JSONL.parseChunk with the correct character offset (accounting for the surrogate pair), and assert the parsed values and read position match the expected results (use the same helpers/result shape as the existing tests).

claude

LGTM — clean, well-tested bug fix that extends existing offset-clamping logic to the string path.

Extended reasoning...

Overview

This PR fixes Bun.JSONL.parseChunk to honor start/end offsets when the input is a string. Previously, offsets were silently ignored for strings, causing infinite loops in error-recovery patterns. Three files are touched: BunObject.cpp (core fix), bun.d.ts (type consolidation), and jsonl-parse.test.ts (new tests).

Security risks

The C++ change involves input parsing and substring operations, but the bounds-clamping logic is carried over directly from the existing typed-array path — start and end are clamped to [0, length] with start <= end enforced. The view->substring(start, end - start) call is guarded by the clamping, so there is no risk of out-of-bounds access. No new attack surface is introduced.

Level of scrutiny

The C++ diff is small (~20 lines net change, mostly refactoring existing code into a lambda). The logic is straightforward: extract shared clamping, apply to string path, adjust the returned readBytes to be absolute. The test suite is comprehensive, covering edge cases like NaN, Infinity, negative offsets, error-recovery loops, and start/end boundary conditions for both strings and buffers. This warrants moderate scrutiny but the change is well-contained.

Other factors

No outstanding reviewer comments. No CODEOWNERS conflicts. The .d.ts change is a clean consolidation of two overloads into one with updated JSDoc. The test file adds ~140 lines of new tests that mirror the existing typed-array offset suite, providing strong regression coverage. The bug fix addresses a real user-facing issue (infinite loop with O(n²) heap growth).

Jarred-Sumner requested a review from alii as a code owner March 12, 2026 00:03

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

claude bot reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(JSONL): honor start/end offsets for string input in parseChunk#28025

fix(JSONL): honor start/end offsets for string input in parseChunk#28025
Jarred-Sumner wants to merge 1 commit intomainfrom
jarred/fix-jsonl

Jarred-Sumner commented Mar 12, 2026

Uh oh!

robobun commented Mar 12, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 12, 2026

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jarred-Sumner commented Mar 12, 2026

What

Why

How

Tests

Uh oh!

robobun commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 12, 2026

Walkthrough

Changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robobun commented Mar 12, 2026 •

edited

Loading