Parsing PDF documents with paths instead of text #733

jdales · 2023-11-09T13:51:25Z

jdales
Nov 9, 2023

We've been using PdfPig to process PDF forms and, until just recently, had great success extracting text from documents. We've run into documents that contain pages, but no letters and result in no text extraction. The operations of our successful extractions contain a series of Text operations (e.g. TextObjects.BeginText, TextObjects.SetFontAndSize, TextObjects.ShowText, etc...), but these new documents contain PathConstruction operations (e.g. PathConstruciton.AppendStraightLineSegment, PathConstruction.BeginNewSubpath, etc...). Is there a way for us to extract the text from these new documents?

topcat30 · 2023-12-19T03:38:41Z

topcat30
Dec 19, 2023

maybe the hocr would work

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing PDF documents with paths instead of text #733

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Parsing PDF documents with paths instead of text #733

jdales Nov 9, 2023

Replies: 1 comment

topcat30 Dec 19, 2023

jdales
Nov 9, 2023

topcat30
Dec 19, 2023