High-performance PDF-to-structured-data extraction for Python — powered by a Rust engine via PyO3.
pip install edgeparsePre-built wheels are available for macOS, Linux (x86_64, arm64), and Windows (x64). No system dependencies or compilation required.
import edgeparse
# Convert a PDF to Markdown
result = edgeparse.convert("document.pdf")
print(result.markdown)
# Convert with options
result = edgeparse.convert(
"document.pdf",
format="markdown", # "markdown" | "json" | "html"
extract_images=False,
page_range=None, # None = all pages, or [0, 5] for pages 1–6
)edgeparse document.pdf # → Markdown on stdout
edgeparse document.pdf --format json # → JSON
edgeparse /path/to/dir/ --output-dir out/ # batch convertedgeparse consistently leads open benchmarks for PDF-to-Markdown extraction quality across 200-document test suites.