Skip to content

Using LLM to describe images into .docx files#1435

Open
carlodek wants to merge 3 commits intomicrosoft:mainfrom
carlodek:main
Open

Using LLM to describe images into .docx files#1435
carlodek wants to merge 3 commits intomicrosoft:mainfrom
carlodek:main

Conversation

@carlodek
Copy link

@carlodek carlodek commented Oct 1, 2025

Image description with LLM into docx docs

What I've done:

  1. Modified converter_utils/docx/pre_process.py to detect images and put the description generated by LLM into the right place.
  2. Moved _llm_caption file into main folder as it will be used by pre_process file too.
  3. Added an image to test it: docx_with_image_test.docx into test_files folder.

How to test:

I've tested it with AzureOpenAI, here it's a code snippet:

from packages.markitdown.src.markitdown import MarkItDown
from openai import AzureOpenAI

if __name__ == "__main__":
    AZURE_OPEN_AI_ENDPOINT = "<your_endpoint>
    AZURE_OPEN_AI_DEPLOYMENT = "<your_deployment>"
    AZURE_OPEN_AI_KEY = "<your_api_key>"
    AZURE_OPEN_AI_API_VERSION = "<your_version>"
    file_path = "tests/test_files/docx_with_image_test.docx"
    client = AzureOpenAI(
        azure_endpoint=AZURE_OPEN_AI_ENDPOINT,
        api_key=AZURE_OPEN_AI_KEY,
        api_version=AZURE_OPEN_AI_API_VERSION
    )
    md = MarkItDown(llm_client=client, llm_model=AZURE_OPEN_AI_DEPLOYMENT, llm_prompt="Please describe the image")
    result = md.convert(file_path)
    print(result.markdown)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants