Skip to content

Add password support for PDF conversion#1629

Open
mvanhorn wants to merge 1 commit intomicrosoft:mainfrom
mvanhorn:feat/pdf-password-support
Open

Add password support for PDF conversion#1629
mvanhorn wants to merge 1 commit intomicrosoft:mainfrom
mvanhorn:feat/pdf-password-support

Conversation

@mvanhorn
Copy link

Problem

MarkItDown fails with unhelpful errors when encountering password-protected PDF files. There is no way to supply a decryption password through the Python API or CLI.

Reported in #1585.

Solution

Added an optional password parameter that flows through the existing **kwargs pipeline to the PDF converter:

Python API:

from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("protected.pdf", password="secret")

CLI:

markitdown protected.pdf --password secret

Changes

  • _pdf_converter.py: Extract password from kwargs and pass it to both pdfplumber.open() and pdfminer.high_level.extract_text(). Catch PDFPasswordIncorrect and raise a clear FileConversionException with a helpful message.
  • __main__.py: Add --password CLI argument and pass it through to the convert calls.
  • New test file test_pdf_password.py with 5 tests covering correct password, missing password, wrong password, non-encrypted PDF regression, and CLI flag presence.
  • New test fixture test_password.pdf (encrypted with password "testpassword").

Scope

This PR covers PDF password support only. DOCX and XLSX password support (also mentioned in #1585) can follow in separate PRs to keep this one focused and reviewable.

No new dependencies

Both pdfminer.six and pdfplumber already support the password parameter natively.

This contribution was developed with AI assistance (Claude Code).

Pass an optional password parameter through to pdfminer and pdfplumber
when converting encrypted PDFs. Raise a clear FileConversionException
when the password is missing or incorrect. Add a --password CLI flag.

Fixes microsoft#1585
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant