Skip to content

feat: add EML email file converter#1633

Open
mvanhorn wants to merge 1 commit intomicrosoft:mainfrom
mvanhorn:osc/feat-eml-converter
Open

feat: add EML email file converter#1633
mvanhorn wants to merge 1 commit intomicrosoft:mainfrom
mvanhorn:osc/feat-eml-converter

Conversation

@mvanhorn
Copy link

Summary

Adds support for converting RFC 822 .eml files to Markdown using Python's built-in email module. Zero new dependencies. Follows the OutlookMsgConverter pattern exactly.

Why this matters

markitdown handles Outlook .msg files but not standard .eml files (#89). EML is the universal email format used by Thunderbird, Apple Mail, Gmail exports, and email forensic tools. Anyone processing email archives for LLM analysis needs this.

Changes

  • _eml_converter.py: New converter handling message/rfc822 MIME type and .eml extension
  • Multipart MIME support: prefers text/plain, falls back to text/html with tag stripping
  • Extracts From, To, Cc, Subject, Date headers
  • Registered in _markitdown.py alongside OutlookMsgConverter
  • Two test vectors added: plain text email and multipart (text + HTML)

Testing

Both test vectors pass test_guess_stream_info and test_convert_local:

tests/test_module_vectors.py::test_guess_stream_info[test_vector5] PASSED
tests/test_module_vectors.py::test_guess_stream_info[test_vector6] PASSED
tests/test_module_vectors.py::test_convert_local[test_vector5] PASSED
tests/test_module_vectors.py::test_convert_local[test_vector6] PASSED

Relates to #89

This contribution was developed with AI assistance (Claude Code).

Add support for converting RFC 822 .eml files to Markdown, using
Python's built-in email module (zero new dependencies). Follows the
same pattern as OutlookMsgConverter.

Handles:
- Plain text emails
- Multipart MIME (prefers text/plain, falls back to HTML with tag stripping)
- From, To, Cc, Subject, Date headers

Relates to microsoft#89

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant