Releases: SamEdwardes/spacypdfreader
Releases · SamEdwardes/spacypdfreader
0.3.2
0.3.1
Changes
-
Support for
page_range
argument (#16, #18).import spacy from spacypdfreader import pdf_reader from spacypdfreader.parsers import pytesseract nlp = spacy.load("en_core_web_sm") doc = pdf_reader("tests/data/test_pdf_01.pdf", nlp, pytesseract.parser, n_processes=4, page_range=(2, 3))
Fixes
- Remove
shed
as a dependency. It was removing unused imports that were required (#17).
0.3.0
Changes
-
Added support for multi-processing. For example:
import spacy from spacypdfreader.parsers import pytesseract from spacypdfreader.spacypdfreader import pdf_reader nlp = spacy.load("en_core_web_sm") doc = pdf_reader("tests/data/test_pdf_01.pdf", nlp, pytesseract.parser, n_processes=4) print(doc._.first_page) print(doc._.last_page) print(doc[12].text) print(doc[12]._.page_number)
-
Changed the way in which parsers are implemented. They are now implemented with a function as opposed to a class. See https://github.com/SamEdwardes/spacypdfreader/tree/feature/multi-processing/spacypdfreader/parsers for examples.
Fixes
None
0.2.1
- Added examples to the API docs.
- Added deployment checklist to the docs.
0.2.0
- Added support for additional pdf to text extraction engines:
- Added the ability to bring your own pdf to text extraction engine.
- Added new spacy extension attributes and methods:
doc._.page_range
doc._.first_page
doc._.last_page
doc._.pdf_file_name
doc._.page(int)
- Built a new documentation site: https://samedwardes.github.io/spaCyPDFreader/
0.1.1
What's Changed
- 0.1.1 Python ^3.7 support by @SamEdwardes in #2
New Contributors
- @SamEdwardes made their first contribution in #2
Full Changelog: https://github.com/SamEdwardes/spaCyPDFreader/commits/v0.1.1