Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

edgeparse

High-performance PDF-to-structured-data extraction for Python — powered by a Rust engine via PyO3.

Install

pip install edgeparse

Pre-built wheels are available for macOS, Linux (x86_64, arm64), and Windows (x64). No system dependencies or compilation required.

Quick start

import edgeparse

# Convert a PDF to Markdown
result = edgeparse.convert("document.pdf")
print(result.markdown)

# Convert with options
result = edgeparse.convert(
    "document.pdf",
    format="markdown",      # "markdown" | "json" | "html"
    extract_images=False,
    page_range=None,        # None = all pages, or [0, 5] for pages 1–6
)

CLI

edgeparse document.pdf                     # → Markdown on stdout
edgeparse document.pdf --format json       # → JSON
edgeparse /path/to/dir/ --output-dir out/  # batch convert

Performance

edgeparse consistently leads open benchmarks for PDF-to-Markdown extraction quality across 200-document test suites.

Links