This repo is for training and testing brca cancer detection pipeline using 3 standard CNNs: VGG16, Resnet-34, and Inception-v4. More details are found in the paper: Utilizing Automated Breast Cancer Detection to Identify Spatial Distributions of Tumor Infiltrating Lymphocytes in Invasive Breast Cancer
NOTE: download the trained models here, extract the model files to data/models_cnn
The default settings are for Resnet-34 since it performs the best on the public testset. To use other models, change the variable "MODEL" in conf/variables.sh to other models name downloaded from google drive above.
- Pytorch 0.4.0
- Torchvision 0.2.0
- cv2 (3.4.1)
- Openslide 1.1.1
- sklearn
- PIL
More details are in file brca_environ.txt
- Codes are in folder scripts, including training and testing
- Need to setup folder path for training data, model. All parameters are found in conf/variables.sh
- Change the BASE_DIR to the path of your folder after you clone the git repo
- Settings are stored in conf/variables.sh
- Change DATA_PATH to your folder that contains all subfolders for training
- Change DATA_LIST to your text file name that contains list of subfolders for training and validataion. 1st line is for validation, the rest is for training. Example of the list is tumor_data_list_toy.txt
- Run a demo training that uses a subset of training data: python train_cancer_cnn_Resnet_pretrained.py
- To run a full training that uses all training data, remove line 107-111 in "train_cancer_cnn_Resnet_pretrained.py", then run python train_cancer_cnn_Resnet_pretrained.py
- Log files are in data/log
- Trained models are in "checkpoint"
- Change MODEL in conf/variables.sh to your model name that is stored in data/models_cnn
- Copy all .svs files to data/svs
- For example, cd to your data/svs, run "cp /data01/shared/hanle/svs_tcga_seer_brca/TCGA-3C-AALI-01Z-00-DX2.svs ."
- Patch extraction: go to folder "patch_extraction_cancer_40X", run "nohup bash start.sh &"
- Prediction: go to folder "prediction", run "nohup bash start.sh &"
- Generate json heatmap files: go to folder "heatmap_gen", run "nohup bash start.sh &"
- Output are in data/heatmap_txt and data/heatmap_jsons
- To run all the above steps, including patch extraction, prediction, and generate json files, go to folder "scripts", run bash svs_2_heatmap.sh
- Go to folder "download_heatmap/get_grayscale_heatmaps", run "bash start.sh"
- Results are stored at download_heatmap/get_grayscale_heatmaps/grayscale_heatmaps and data/grayscale_heatmaps
- Compare the grayscale heatmap with the one on website: https://mathbiol.github.io/tcgatil/
A Docker image is available at: pytorch docker
Create folder named "data" and subfoders below:
- change the BASE_DIR setting in conf/variables.sh to the path of your working directory
- data/svs: to contains *.svs files
- data/training_data: to contain training data
- data/patches: to contain output from patch extraction
- data/log: to contain log files
- data/heatmap_txt: to contain prediction output
- Run "bash create_container.sh" to create container for the docker
- Run "bash start_interactive_bash.sh" to start the docker workspace
- Clone codes from this repository to workspace of docker.
- run: "mv quip_cancer_segmentation/* ."
- Follow instructions for Training and Testing as below.
@article{le2020utilizing,
title={Utilizing Automated Breast Cancer Detection to Identify Spatial Distributions of Tumor Infiltrating Lymphocytes in Invasive Breast Cancer},
author={Le, Han and Gupta, Rajarsi and Hou, Le and Abousamra, Shahira and Fassler, Danielle and Torre-Healy, Luke and Moffitt, Richard A and Kurc, Tahsin and Samaras, Dimitris and Batiste, Rebecca and others},
journal={The American Journal of Pathology},
year={2020},
publisher={Elsevier}
}