Universal web content extraction — any URL to LLM-ready markdown.
| Skill | Description |
|---|---|
extract-url |
Extract content from a web URL (HTML, YouTube, PDF) |
extract-file |
Convert local PDF/DOCX to markdown |
batch-extract |
Bulk extract from multiple URLs |
pip install markgrab # core
pip install "markgrab[all]" # all content types- HTML — content density filtering, auto-fallback to Playwright for JS-heavy sites
- YouTube — transcript extraction with timestamps and multi-language support
- PDF — text extraction with page structure
- DOCX — paragraph and heading extraction