Skip to content

text-page-separator flag accepted but not applied to text output #39

@matthew-gizmo

Description

@matthew-gizmo

3a384491_1769098711502_BWL_Zusammenfassung.pdf

Summary

--text-page-separator appears to be accepted by the CLI but is not actually applied to the emitted text output.

I verified this on edgeparse 0.2.5.

Interesting project, let me know if you are looking for potential collaborators!

Minimal reproduction

edgeparse '/path/to/file.pdf' \
  --format text \
  --output-dir /tmp/edgeparse-issue-repro \
  --text-page-separator '[[PAGE %page-number%]]'

Example real command I used:

~/edgeparse '/Users/mjgp2/Library/CloudStorage/GoogleDrive-matthew@gizmo.ai/Shared drives/PDFs/sample-pdf/3a384491_1769098711502_BWL_Zusammenfassung.pdf' \
  --format text \
  --output-dir /tmp/edgeparse-issue-repro \
  --text-page-separator '[[PAGE %page-number%]]'

Actual behavior

  • CLI exits successfully
  • text output file is written
  • output does not contain any [[PAGE N]] markers

I confirmed with:

rg -n '\[\[PAGE ' /tmp/edgeparse-issue-repro/3a384491_1769098711502_BWL_Zusammenfassung.txt

which produced no matches.

Expected behavior

The emitted .txt output should include the requested page separator, for example:

  • [[PAGE 1]]
  • [[PAGE 2]]
  • etc.

Notes

  • --markdown-page-separator appears to work correctly
  • this seems specific to --text-page-separator
  • I also observed the same behavior when text was requested alongside other formats, e.g. --format markdown-with-html,json,text

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions