fix(core): preserve whitespace edge cases but collapse html formatting newlines (BLO-1065)#2551
fix(core): preserve whitespace edge cases but collapse html formatting newlines (BLO-1065)#2551
Conversation
…g newlines (BLO-1065) - Added a targeted DOM preprocessing step before ProseMirror parsing. - Explictly replicates CSS white-space: normal behavior internally to fix MS Word line breaks. - Retains preserveWhitespace: true in PM to satisfy AI diffing constraints from PR #2230. - Skips Notion HTML to preserve intentional hard breaks.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Background Context: Why DOM Preprocessing?This PR resolves a conflict between two parsing requirements:
To support requirement 2, PR #2230 set ProseMirror's Why not change
Conclusion: |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe PR introduces HTML whitespace preprocessing functionality to normalize non-Notion HTML before parsing. A new utility module detects Notion-specific HTML markers and collapses whitespace in text nodes while preserving content within code elements. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
@blocknote/ariakit
@blocknote/code-block
@blocknote/core
@blocknote/mantine
@blocknote/react
@blocknote/server-util
@blocknote/shadcn
@blocknote/xl-ai
@blocknote/xl-docx-exporter
@blocknote/xl-email-exporter
@blocknote/xl-multi-column
@blocknote/xl-odt-exporter
@blocknote/xl-pdf-exporter
commit: |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
wordcopy.txt (1)
11-21: Local file paths in test fixture.This file contains local filesystem paths (
file:////Users/yousef/Library/...) which are not functionally relevant and could be removed or anonymized for cleaner test data. These paths won't affect the parsing since they're in<link>elements that don't impact the text content extraction.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@wordcopy.txt` around lines 11 - 21, The test fixture contains literal local file URIs in the <link> elements (e.g., the file:////Users/yousef/... values in the href attributes) which should be removed or anonymized; edit wordcopy.txt to either delete those <link rel=File-List>, <link rel=themeData> and <link rel=colorSchemeMapping> lines or replace their href values with neutral/dummy paths (e.g., relative or placeholder URIs) so the fixture no longer contains host-specific local filesystem paths while preserving the structure for parsing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@wordcopy.txt`:
- Around line 11-21: The test fixture contains literal local file URIs in the
<link> elements (e.g., the file:////Users/yousef/... values in the href
attributes) which should be removed or anonymized; edit wordcopy.txt to either
delete those <link rel=File-List>, <link rel=themeData> and <link
rel=colorSchemeMapping> lines or replace their href values with neutral/dummy
paths (e.g., relative or placeholder URIs) so the fixture no longer contains
host-specific local filesystem paths while preserving the structure for parsing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e26020a-caff-4045-ae6e-4a1ae93907d8
📒 Files selected for processing (7)
packages/core/src/api/parsers/html/parseHTML.tspackages/core/src/api/parsers/html/util/normalizeWhitespace.tstests/src/unit/core/clipboard/paste/pasteTestInstances.tstests/src/unit/core/formatConversion/parse/__snapshots__/html/mixedTextTableCell.jsontests/src/unit/core/formatConversion/parse/__snapshots__/html/msWordPaste.jsontests/src/unit/core/formatConversion/parse/parseTestInstances.tswordcopy.txt
…rd-is-broken-blo-1065
Summary
Fixes the issue where copying and pasting from MS Word (or other sources with HTML source formatting) introduced extra, unintended line breaks. (closes #2548, closes #2356)
Rationale
When HTML is pasted, source code line breaks (`\n`) within elements like `
` were being converted into hard breaks in BlockNote. While standard HTML collapses these into spaces, PR #2230 introduced `preserveWhitespace: true` to prevent the stripping of leading and trailing spaces during AI diffing. This global flag disabled all whitespace collapsing, causing every formatting newline to appear as a visible hard break. This PR introduces a targeted preprocessing step to resolve both needs simultaneously.
Changes
Impact
Fixes pasting from MS Word and preserves the behavior required by PR #2230 for AI diffing. No negative impacts expected; Notion pasting behavior is fully preserved.
Testing
Screenshots/Video
N/A
Checklist
Additional Notes
See PR comment for more technical details on why native ProseMirror `preserveWhitespace` settings were insufficient.
Summary by CodeRabbit
Bug Fixes
Tests