Code for the papers End-To-End Handwritten Text Detection and Transcription in Full Pages and TreyNet: A Neural Model for Text Localization, Transcription and Named EntityRecognition in Full Pages.
-
Clone this repo
-
Install the required packages:
apt-get install tk-dev python-tk
- Install the python packages:
pip3 install torch==1.0.1.post2
pip3 install cffi
pip3 install cython
pip3 install opencv-python
pip3 install requests
pip3 install editdistance
Warp ctc pytorch: https://github.com/SeanNaren/warp-ctc
Python pagexml: https://github.com/omni-us/pagexml/tree/master/py-pagexml
Download and extract IAM images and ground truth. You first need to register to the IAM database and set the environment variables for its credentials.
cd datasets/iam
IAM_U=[your IAMDB user]
IAM_PW=[your IAMDB password]
wget http://www.fki.inf.unibe.ch/DBs/iamDB/data/forms/formsA-D.tgz --user $IAM_U --password $IAM_PW
wget http://www.fki.inf.unibe.ch/DBs/iamDB/data/forms/formsE-H.tgz --user $IAM_U --password $IAM_PW
wget http://www.fki.inf.unibe.ch/DBs/iamDB/data/forms/formsI-Z.tgz --user $IAM_U --password $IAM_PW
mkdir forms
tar -C forms -zxvf formsA-D.tgz
tar -C forms -zxvf formsE-H.tgz
tar -C forms -zxvf formsI-Z.tgz
wget http://www.fki.inf.unibe.ch/DBs/iamDB/data/ascii/words.txt --user $IAM_U --password $IAM_PW
Get Aachen train, valid and test partitions
wget https://raw.githubusercontent.com/jpuigcerver/PyLaia/refactor_kws_egs_master/egs/iam-htr/data/part/forms/aachen/tr.lst
wget https://raw.githubusercontent.com/jpuigcerver/PyLaia/refactor_kws_egs_master/egs/iam-htr/data/part/forms/aachen/te.lst
wget https://raw.githubusercontent.com/jpuigcerver/PyLaia/refactor_kws_egs_master/egs/iam-htr/data/part/forms/aachen/va.lst
python generate_gt_IAM_csv.py words.txt
cd ../..
mkdir trained_models
echo "text,0" > binary_class.csv
source experiments/set_path.sh
./train_e2e.sh
./test_e2e.sh
./predict_e2e.sh
mkdir trained_models
wget -O trained_models/iam_join_det_htr_csv_retinanet.pt https://github.com/manucarbonell/models/blob/master/research-e2e-pagereader/iam_join_det_htr_lstm_csv_retinanet.pt?raw=true
./predict_e2e.sh
Download and extract IEHHR images and ground truth from https://rrc.cvc.uab.es/?ch=10&com=downloads.
Convert pagexml ground truth to csv to be read by the dataloader.
python3 pagexml2csv.py --pxml_dir datasets/esposalles/train --fout datasets/esposalles/train.csv --classes_out classes.csv --get_property True --seg_lev TextLine
python3 pagexml2csv.py --pxml_dir datasets/esposalles/valid --fout datasets/esposalles/valid.csv --classes_out classes.csv --get_property True --seg_lev TextLine
python3 pagexml2csv.py --pxml_dir datasets/esposalles/test --fout datasets/esposalles/test.csv --classes_out classes.csv --get_property True --seg_lev TextLine
Run experiment
./experiments/run.sh esposalles