`PDFPage.get_pages` is inefficient for extracting a subset of pages #1040

dhdaines · 2024-09-11T22:27:46Z

The issue is that, while we do need to walk the page tree (maybe) in order to get to a particular page by its index, we shouldn't be actually parsing all the pages if not specifically requested.

dhdaines · 2024-09-12T17:38:53Z

Ah. Turns out that instantiating PDFPage is not really expensive at all. This is more of an API problem, people should not use get_pages in this way if they are going to call it repeatedly. I will close this, not really an issue.

dhdaines closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`PDFPage.get_pages` is inefficient for extracting a subset of pages #1040

`PDFPage.get_pages` is inefficient for extracting a subset of pages #1040

dhdaines commented Sep 11, 2024

dhdaines commented Sep 12, 2024

PDFPage.get_pages is inefficient for extracting a subset of pages #1040

PDFPage.get_pages is inefficient for extracting a subset of pages #1040

Comments

dhdaines commented Sep 11, 2024

dhdaines commented Sep 12, 2024

`PDFPage.get_pages` is inefficient for extracting a subset of pages #1040

`PDFPage.get_pages` is inefficient for extracting a subset of pages #1040