You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My multi-page pdf pages contains a mix of digital and scanned images that can be OCR'd.
In this case, redo-ocr is the current option to blank out certain areas of the page from OCR while avoiding rasterizing. However, it is possible for these results to be poorer than force-ocr on that same page. Force-ocr however rastertizes the page and can cause increase in file size. Force-ocr also enables this setting for all pages regardless of whether it may be the best option (if page has no text but only images, normal workflow w/o full page raster is the best option)
So, can we detect if page has text and then only force-ocr? If page has no text, ocrmypdf proceeds as normal and only overlays text layer on top of original page. If page has text, force-ocr and rasterize.
Sorry if this is already possible, but seems like force-ocr applies to all pages if enabled. @jbarlow83 Thank you for the assist!
The text was updated successfully, but these errors were encountered:
bumping this @jbarlow83 I know you must be busy - but is there a way to selectively apply force-ocr to pages that have text, and proceed normally for pages that are clean?
Describe the proposed feature
My multi-page pdf pages contains a mix of digital and scanned images that can be OCR'd.
In this case, redo-ocr is the current option to blank out certain areas of the page from OCR while avoiding rasterizing. However, it is possible for these results to be poorer than force-ocr on that same page. Force-ocr however rastertizes the page and can cause increase in file size. Force-ocr also enables this setting for all pages regardless of whether it may be the best option (if page has no text but only images, normal workflow w/o full page raster is the best option)
So, can we detect if page has text and then only force-ocr? If page has no text, ocrmypdf proceeds as normal and only overlays text layer on top of original page. If page has text, force-ocr and rasterize.
Sorry if this is already possible, but seems like force-ocr applies to all pages if enabled. @jbarlow83 Thank you for the assist!
The text was updated successfully, but these errors were encountered: