Skip to content

Python-based pipeline to prepare scanned PDFs in the DSCC collection for publication

Notifications You must be signed in to change notification settings

isawnyu/dscc-pdf-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSCC PDF Pipeline

Python-based pipeline to prepare scanned PDFs in the DSCC collection for publication

Pipeline description

Image correction -> OCR -> PDF resizing -> Coverpage addition -> Metadata embedding -> Final pdf output

Usage

  • Place pdf in data/input
  • Add metadata to data/metadata.csv
  • sh src/pipeline.sh

Written by Patrick J. Burns, ISAW Library; 2022-2023.

About

Python-based pipeline to prepare scanned PDFs in the DSCC collection for publication

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published