Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces initial support for generating tagged/accessible PDFs by building a PDF structure tree during rendering, with layout-time metadata to drive tagging.
Changes:
- Add an
AccessibilityTaggerand integrate it into the render pipeline whendocDefinition.taggedis enabled. - Annotate layout output (lines, images, SVGs, canvas vectors) with accessibility context (roles + table/list context) to drive structure tags.
- Update unbreakable-block committing logic to iterate multiple pages (instead of only the first page).
Reviewed changes
Copilot reviewed 8 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/printer.js |
Wires accessibility tagging into page rendering; marks artifacts; handles table/list/link tagging during line rendering. |
src/layoutBuilder.js |
Produces _accessibilityContext metadata for lines and image/SVG/vector nodes; tracks list/table context while laying out content. |
src/accessibilityTagger.js |
New module that builds PDF structure elements (Document/Sect/P/H/L/Table/Figure/Link/Artifact) using PDFKit structure APIs. |
src/accessibilityMetadata.js |
New helper for determining accessibility roles and whether tables should be tagged/treated as TOC. |
src/pageElementWriter.js |
Changes unbreakable block commit logic to process all pages in the transaction context. |
src/elementWriter.js |
Fixes pushContext height calculation fallback to use pageSize.height when page.height is absent. |
webpack.config.js |
Keeps console statements in minified output by setting drop_console: false. |
package.json |
Renames package to @wellcometrust/pdfmake and bumps version to 2.1.0. |
src/3rd-party/svg-to-pdfkit/source.js |
Vendor source update (appears to be a sync/copy of upstream file). |
Comments suppressed due to low confidence (1)
src/pageElementWriter.js:126
commitUnbreakableBlocknow iterates allunbreakableContext.pages, but the height calculation usesunbreakableContext.getCurrentPage().pageSize.heightinside the loop.getCurrentPage()returns whateverunbreakableContext.pageis currently set to, so for multi-page unbreakable blocks this can read the wrong page size for most fragments. Usefragment.pageSize.height(or setunbreakableContext.page = currentPagebefore reading) so each fragment uses its own page size.
for (let currentPage = 0; currentPage < nbPages; currentPage++) {
var fragment = unbreakableContext.pages[currentPage];
fragment.xOffset = forcedX;
fragment.yOffset = forcedY;
//TODO: vectors can influence height in some situations
if (nbPages > 1) {
// on out-of-context blocs (headers, footers, background) height should be the whole DocumentContext height
if (forcedX !== undefined || forcedY !== undefined) {
fragment.height = unbreakableContext.getCurrentPage().pageSize.height - unbreakableContext.pageMargins.top - unbreakableContext.pageMargins.bottom;
} else {
fragment.height = this.writer.context.getCurrentPage().pageSize.height - this.writer.context.pageMargins.top - this.writer.context.pageMargins.bottom;
for (var i = 0, l = this.repeatables.length; i < l; i++) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (listCtx) { | ||
| if (!prevListCtx) { | ||
| // Entering a list | ||
| tagger.beginList(); | ||
| } else if (listCtx.depth > prevListCtx.depth) { | ||
| // Entering a nested list | ||
| tagger.beginList(); | ||
| } else if (listCtx.depth < prevListCtx.depth) { | ||
| // Leaving a nested list | ||
| tagger.endList(); | ||
| } | ||
|
|
||
| // New list item? | ||
| if (listCtx.isFirstInItem) { | ||
| tagger.beginListItem(); | ||
| } | ||
| } else if (prevListCtx) { | ||
| // Leaving list context entirely | ||
| tagger.endList(); | ||
| } |
There was a problem hiding this comment.
List structure transitions only call tagger.endList() once when the list nesting depth decreases or when leaving list context entirely. If rendering jumps from a nested list depth >1 directly to depth 0 (e.g., the last list item contains only a nested list), this will leave the outer list(s) open and produce an invalid/incorrect structure tree. Update the logic to close lists in a loop until the open list depth matches the new listCtx.depth (or 0 when listCtx is null).
| if (nbPages > 0) { | ||
| // no support for multi-page unbreakableBlocks | ||
| var fragment = unbreakableContext.pages[0]; | ||
| for (let currentPage = 0; currentPage < nbPages; currentPage++) { |
There was a problem hiding this comment.
This file is included in the browser bundle, but the webpack Babel rule only transpiles a limited set of files (see webpack.config.js include regex) targeting IE11. Introducing let here means the output may contain untranspiled ES2015 syntax and break IE11 builds. Either avoid let/const in non-transpiled files or update the Babel include list to cover this module.
| for (let currentPage = 0; currentPage < nbPages; currentPage++) { | |
| for (var currentPage = 0; currentPage < nbPages; currentPage++) { |
| }, | ||
| compress: { | ||
| drop_console: true | ||
| drop_console: false |
There was a problem hiding this comment.
Setting drop_console: false will retain console.* calls in the minified browser build. If this is intended, it should be documented; otherwise it increases bundle size and can leak internal/debug output in production. Consider restoring drop_console: true (or gating this by build mode).
| drop_console: false | |
| drop_console: true |
| { | ||
| "name": "pdfmake", | ||
| "version": "0.2.23", | ||
| "name": "@wellcometrust/pdfmake", | ||
| "version": "2.1.0", | ||
| "description": "Client/server side PDF printing in pure JavaScript", |
There was a problem hiding this comment.
This PR is titled/described as adding accessibility tagging, but it also changes the published package name and bumps the version from 0.2.23 to 2.1.0. That’s a significant distribution/API change that should either be called out in the PR description/release notes or split into a separate PR to keep the change set scoped.
| var tagger = null; | ||
| if (docDefinition.tagged) { | ||
| tagger = new AccessibilityTagger(this.pdfKitDoc); | ||
| tagger.initDocument(); | ||
| } | ||
|
|
||
| renderPages(pages, this.fontProvider, this.pdfKitDoc, patterns, options.progressCallback, tagger); | ||
|
|
||
| if (tagger) { | ||
| tagger.finalise(); | ||
| } |
There was a problem hiding this comment.
Accessibility tagging is newly introduced here (creating an AccessibilityTagger, passing it through rendering, and finalising it), but there are no corresponding tests asserting that tagged PDFs emit the expected structure tree / marked-content operations. Since tests/printer.js exists, consider adding coverage that verifies the tagger is invoked when docDefinition.tagged is true (e.g., spies on PDFKit struct/mark calls) and not invoked otherwise.
| // Track what's currently open for page break handling | ||
| this.openStructures = []; | ||
|
|
There was a problem hiding this comment.
openStructures is initialized but never used, which makes it harder to tell what state is actually required for tagging. Consider removing it or wiring it into the page-break handling logic if it’s meant to track open elements.
| // Track what's currently open for page break handling | |
| this.openStructures = []; |
What does this change?
How to test
How can we measure success?
Have we considered potential risks?