Skip to content

0.3.0

Compare
Choose a tag to compare
@SamEdwardes SamEdwardes released this 18 May 03:32
· 19 commits to main since this release
24f9d86

Changes

  • Added support for multi-processing. For example:

    import spacy
    
    from spacypdfreader.parsers import pytesseract
    from spacypdfreader.spacypdfreader import pdf_reader
    
    nlp = spacy.load("en_core_web_sm")
    doc = pdf_reader("tests/data/test_pdf_01.pdf", nlp, pytesseract.parser, n_processes=4)
    print(doc._.first_page)
    print(doc._.last_page)
    print(doc[12].text)
    print(doc[12]._.page_number)
  • Changed the way in which parsers are implemented. They are now implemented with a function as opposed to a class. See https://github.com/SamEdwardes/spacypdfreader/tree/feature/multi-processing/spacypdfreader/parsers for examples.

Fixes

None