PETGUI

📰 News

We are excited to share that we presented and published our PETGUI poster this year at the 69th "Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie" (GMDS)" annual conference in Dresden, Germany:

Fig.1 - Our PETGUI poster at this year's 69th annual GMDS conference.

We present PETGUI (Pattern-Exploiting Training GUI), a user-friendly graphical user interface for training, testing and labeling with pre-trained masked language models using Pattern-Exploiting Training, a state-of-the-art machine learning framework for text classification tasks using few-shot learning and prompting. Concretely, PETGUI facilitates a multistep pipeline of training and testing on labeled data, followed by annotating on unlabeled data in a comprehensible and intuitive way. PETGUI also provides valuable insights into various aspects of the training, with statistics for label distribution and model performance. We envision our app as a pivotal use-case of a simple machine learning application, that is accessible and manageable by users without domain specific knowledge, in our case by physicians from clinical routine.

Pattern Exploiting Training

PET (Pattern-Exploiting Training) is a semi-supervised training strategy for language models. By reformulating input examples as cloze-style phrases, it has been shown to significantly outperform standard supervised training (Schick et al., 2021), especially valuable for low-resource settings, such as the German clinical domain (Richter-Pechanski et al., 2023).

Fig.2 - Illustration of the PET workflow, see Schick et al., 2021

In this illustration, the pattern "It was ___ ." is a cloze-style phrase, textually explaining to the model what the task is about, in this case: sentiment classification.
For this, PET works in the following way: A pretrained language model is first trained on each of such patterns (1).
Secondly, an ensemble of these models annotates unlabeled training data (2).
Finally, a classifier is trained on the resulting soft-labeled dataset (3).

🧰 PETGUI Requirements

A Linux host system
A connection to a remote Slurm cluster with GPUs, accessible via LDAP
Docker=1.5-2
Python=3.11
Torch=2.1.1 (on the remote Slurm cluster)

To run PETGUI on your machine, you need:

A working connection to a remote Slurm cluster.
Ldap credentials for accessing the remote Slurm cluster.
The ca-certificate file for the remote Slurm cluster.

Installation

In a terminal, git clone the repo and change directory to it.
Adapt the Slurm configuration SBATCH lines of train.sh and predict.sh for your remote Slurm cluster:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:pascal:1
#SBATCH -n 1
#SBATCH -N 1
#SBATCH -c 2
#SBATCH --mem=16G
#SBATCH --job-name=petgui

Adapt conf.yaml to the LDAP server specifications for your remote Slurm cluster:

"CLUSTER_NAME" : "cluster.ORGANISATION-NAME.org"

"LDAP_SERVER" : 'ldap://ldap2.ORGANISATION-NAME.org'
"CA_FILE" : 'ORGANISATION-NAME_CA.pem'
"USER_BASE" : 'dc=ORGANISATION-NAME,dc=org'
"LDAP_SEARCH_FILTER" : '({name_attribute}={name})'

Move your certificate file of the server to /conf directory (example reference file).
Build docker image: docker build . -t petgui.

🛫 Start PETGUI

Change directory to repository: cd /PETGUI
Run the docker container: docker run --name petgui -p 89:89 --mount type=bind,source=./conf,target=/home/appuser/conf petgui

INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:89 (Press CTRL+C to quit)

Open localhost http://localhost:89 in a browser.

👟️ Run PETGUI

You successfully started PETGUI! To run PETGUI, please see the below steps.

Steps	What you will see
1. Login with ldap credentials for your remote Slurm cluster:
2. Input training parameters, for the German few-shot sample data: SAMPLE: 1, LABEL: 0 TEMPLATE: Es war _ . (include underscore character: "_" as a separator in the template, click "+" to add more) VERBALIZER: 1 schlecht, 2 gut Chose one of pre-defined language model: `gbert-base` or `medbert`. Click `View Data` to get statistics on your data as label distribution plots.
3. Click `Submit` to proceed. `Start Training` to start the model training. You may `Abort` the process, which will terminate training and navigate you to step 2.
4. `Show Results` to view model results, displaying accuracy per pattern, as well as precision, recall, f1-measure, and support per label.
5. Choose to either re-train with new parameters (`Run with new configuration`) or continue wit trained model for labeling unseen data (`Annotate unseen data`).
6. Test the model on evaluation data (sample data): `Upload unlabeled data as a csv file` and make sure, that the first column in your dataset contains nothing throughout your data lines. `Predict Labels Using PET Model` starts prediction process. When complete, `Download Predicted Data`.

Stop PETGUI

In the terminal: Ctrl + C to stop the running "uvicorn" process:

^CINFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [1]

To restart PETGUI:
In the terminal:

docker stop petgui
docker rm petgui
docker run... from 2.

➕ Features

PETGUI provides an intuitive GUI for the PET workflow. Concretely, with PETGUI you can:

Display statistics on label distribution of the training data
Train either bert-base-cased or medbert-512 on a labeled dataset
Display statistics on the model performance
Test the trained model to generate predictions on unseen data
Download the labeled file

➖ Limitations

In its current form, PETGUI is bound by the following requirements, which we may further simplify in future work:

Connection to remote Slurm cluster: You must have a working connection to a remote Slurm cluster.
File format and naming convention: The provided training data must be a tar.gz file containing train.csv, test.csv and unlabeled.csv, like our sample training data. The evaluation data must be a comma separated .txt file with the first column empty throughout, like our sample test data.
Verbalizer mapping: The tokenizer splits words into sub-words, e.g.: "Langeswort" becomes "Langes" and "#wort". The provided verbalizer has to map to a single input-id, hence the user must provide a sub-word from the model vocabulary.
We plan on adding user feedback to ensure correct input.

🗃️ References

Timo Schick and Hinrich Schütze. (2021). Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference. arXiv preprint arXiv:2001.07676.
Timo Schick. (2023). Pattern-Exploiting Training (PET) GitHub repository
Richter-Pechanski P, Wiesenbach P, Schwab DM, Kiriakou C, He M, Geis NA, Frank A, Dieterich C. Few-Shot and Prompt Training for Text Classification in German Doctor's Letters. Stud Health Technol Inform. 2023 May 18;302:819-820. doi: 10.3233/SHTI230275. PMID: 37203504.
Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH) 08.09. - 13.09.2024, Dresden

Name		Name	Last commit message	Last commit date
Latest commit History 545 Commits
app		app
conf		conf
data		data
docs		docs
pet		pet
static		static
templates		templates
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PETGUI

📰 News

🔎 Contents

Pattern Exploiting Training

🧰 PETGUI Requirements

Installation

🛫 Start PETGUI

👟️ Run PETGUI

Stop PETGUI

➕ Features

➖ Limitations

🗃️ References

About

Releases

Packages

Contributors 4

Languages

License

dieterich-lab/PETGUI

Folders and files

Latest commit

History

Repository files navigation

PETGUI

📰 News

🔎 Contents

Pattern Exploiting Training

🧰 PETGUI Requirements

Installation

🛫 Start PETGUI

👟️ Run PETGUI

Stop PETGUI

➕ Features

➖ Limitations

🗃️ References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages