Skip to content

Add accessibilty tagging#2

Open
stuartwilkes wants to merge 3 commits intowellcome-mainfrom
task/accessibilty-v2
Open

Add accessibilty tagging#2
stuartwilkes wants to merge 3 commits intowellcome-mainfrom
task/accessibilty-v2

Conversation

@stuartwilkes
Copy link

What does this change?

How to test

How can we measure success?

Have we considered potential risks?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces initial support for generating tagged/accessible PDFs by building a PDF structure tree during rendering, with layout-time metadata to drive tagging.

Changes:

  • Add an AccessibilityTagger and integrate it into the render pipeline when docDefinition.tagged is enabled.
  • Annotate layout output (lines, images, SVGs, canvas vectors) with accessibility context (roles + table/list context) to drive structure tags.
  • Update unbreakable-block committing logic to iterate multiple pages (instead of only the first page).

Reviewed changes

Copilot reviewed 8 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/printer.js Wires accessibility tagging into page rendering; marks artifacts; handles table/list/link tagging during line rendering.
src/layoutBuilder.js Produces _accessibilityContext metadata for lines and image/SVG/vector nodes; tracks list/table context while laying out content.
src/accessibilityTagger.js New module that builds PDF structure elements (Document/Sect/P/H/L/Table/Figure/Link/Artifact) using PDFKit structure APIs.
src/accessibilityMetadata.js New helper for determining accessibility roles and whether tables should be tagged/treated as TOC.
src/pageElementWriter.js Changes unbreakable block commit logic to process all pages in the transaction context.
src/elementWriter.js Fixes pushContext height calculation fallback to use pageSize.height when page.height is absent.
webpack.config.js Keeps console statements in minified output by setting drop_console: false.
package.json Renames package to @wellcometrust/pdfmake and bumps version to 2.1.0.
src/3rd-party/svg-to-pdfkit/source.js Vendor source update (appears to be a sync/copy of upstream file).
Comments suppressed due to low confidence (1)

src/pageElementWriter.js:126

  • commitUnbreakableBlock now iterates all unbreakableContext.pages, but the height calculation uses unbreakableContext.getCurrentPage().pageSize.height inside the loop. getCurrentPage() returns whatever unbreakableContext.page is currently set to, so for multi-page unbreakable blocks this can read the wrong page size for most fragments. Use fragment.pageSize.height (or set unbreakableContext.page = currentPage before reading) so each fragment uses its own page size.
		for (let currentPage = 0; currentPage < nbPages; currentPage++) {
			var fragment = unbreakableContext.pages[currentPage];
			fragment.xOffset = forcedX;
			fragment.yOffset = forcedY;

			//TODO: vectors can influence height in some situations
			if (nbPages > 1) {
				// on out-of-context blocs (headers, footers, background) height should be the whole DocumentContext height
				if (forcedX !== undefined || forcedY !== undefined) {
					fragment.height = unbreakableContext.getCurrentPage().pageSize.height - unbreakableContext.pageMargins.top - unbreakableContext.pageMargins.bottom;
				} else {
					fragment.height = this.writer.context.getCurrentPage().pageSize.height - this.writer.context.pageMargins.top - this.writer.context.pageMargins.bottom;
					for (var i = 0, l = this.repeatables.length; i < l; i++) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +730 to +749
if (listCtx) {
if (!prevListCtx) {
// Entering a list
tagger.beginList();
} else if (listCtx.depth > prevListCtx.depth) {
// Entering a nested list
tagger.beginList();
} else if (listCtx.depth < prevListCtx.depth) {
// Leaving a nested list
tagger.endList();
}

// New list item?
if (listCtx.isFirstInItem) {
tagger.beginListItem();
}
} else if (prevListCtx) {
// Leaving list context entirely
tagger.endList();
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List structure transitions only call tagger.endList() once when the list nesting depth decreases or when leaving list context entirely. If rendering jumps from a nested list depth >1 directly to depth 0 (e.g., the last list item contains only a nested list), this will leave the outer list(s) open and produce an invalid/incorrect structure tree. Update the logic to close lists in a loop until the open list depth matches the new listCtx.depth (or 0 when listCtx is null).

Copilot uses AI. Check for mistakes.
if (nbPages > 0) {
// no support for multi-page unbreakableBlocks
var fragment = unbreakableContext.pages[0];
for (let currentPage = 0; currentPage < nbPages; currentPage++) {
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is included in the browser bundle, but the webpack Babel rule only transpiles a limited set of files (see webpack.config.js include regex) targeting IE11. Introducing let here means the output may contain untranspiled ES2015 syntax and break IE11 builds. Either avoid let/const in non-transpiled files or update the Babel include list to cover this module.

Suggested change
for (let currentPage = 0; currentPage < nbPages; currentPage++) {
for (var currentPage = 0; currentPage < nbPages; currentPage++) {

Copilot uses AI. Check for mistakes.
},
compress: {
drop_console: true
drop_console: false
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting drop_console: false will retain console.* calls in the minified browser build. If this is intended, it should be documented; otherwise it increases bundle size and can leak internal/debug output in production. Consider restoring drop_console: true (or gating this by build mode).

Suggested change
drop_console: false
drop_console: true

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 4
{
"name": "pdfmake",
"version": "0.2.23",
"name": "@wellcometrust/pdfmake",
"version": "2.1.0",
"description": "Client/server side PDF printing in pure JavaScript",
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is titled/described as adding accessibility tagging, but it also changes the published package name and bumps the version from 0.2.23 to 2.1.0. That’s a significant distribution/API change that should either be called out in the PR description/release notes or split into a separate PR to keep the change set scoped.

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +191
var tagger = null;
if (docDefinition.tagged) {
tagger = new AccessibilityTagger(this.pdfKitDoc);
tagger.initDocument();
}

renderPages(pages, this.fontProvider, this.pdfKitDoc, patterns, options.progressCallback, tagger);

if (tagger) {
tagger.finalise();
}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessibility tagging is newly introduced here (creating an AccessibilityTagger, passing it through rendering, and finalising it), but there are no corresponding tests asserting that tagged PDFs emit the expected structure tree / marked-content operations. Since tests/printer.js exists, consider adding coverage that verifies the tagger is invoked when docDefinition.tagged is true (e.g., spies on PDFKit struct/mark calls) and not invoked otherwise.

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +62
// Track what's currently open for page break handling
this.openStructures = [];

Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openStructures is initialized but never used, which makes it harder to tell what state is actually required for tagging. Consider removing it or wiring it into the page-break handling logic if it’s meant to track open elements.

Suggested change
// Track what's currently open for page break handling
this.openStructures = [];

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants