Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDFPage.get_pages is inefficient for extracting a subset of pages #1040

Closed
dhdaines opened this issue Sep 11, 2024 · 1 comment
Closed

PDFPage.get_pages is inefficient for extracting a subset of pages #1040

dhdaines opened this issue Sep 11, 2024 · 1 comment

Comments

@dhdaines
Copy link
Contributor

See ocrmypdf/OCRmyPDF#1378

The issue is that, while we do need to walk the page tree (maybe) in order to get to a particular page by its index, we shouldn't be actually parsing all the pages if not specifically requested.

@dhdaines
Copy link
Contributor Author

Ah. Turns out that instantiating PDFPage is not really expensive at all. This is more of an API problem, people should not use get_pages in this way if they are going to call it repeatedly. I will close this, not really an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant