This repo contains the code to do data/MC correction using chained quantile regression and stochastic matching.
The class quantileRegression_chain
can be used to correct a set of continious variables differentially and
while keeping their correlations. The class quantileRegression_chain_disc
can be used to correct discontinious variables.
To train the BDTs that will be used to extract the conditional pdf the functions trainOnData
for data and trainOnMC
for MC have to be used. For example:
import quantileRegression_chain as qRegC
qRC = qRegC.quantileRegression_chain(year,EBEE,workDir,variables)
qRC.loadDataDF(df_name,0,stop,rsh,columns)
qRC.trainOnData(variable,weightsDir)
The strategy to train on a large dataset is the following
-
Train on data
To train on data use
scripts/run_qRC_training.sh
./run_qRC_training.sh <config_file_ShowerShapes>.yaml <config_file_PhotonIso>.yaml <config_file_ChargedIsos>.yaml <n_evts> <EB/EE>
This will submit one job per quantile per variable to the SGE queue via qub. BEWARE: There is a hard coded path in this script. Change it accordingly
-
Train Shower Shapes on MC
To train the shower shape correction for MC use
training/train_qRC_MC.py
. Before starting the training on MC, the training on Data needs to be finished completelypython train_qRC_MC.py -c <config_file_ShowerShapes>.yaml -N <n_evts> -E <EB/EE> -B <cluster_profile> -i <cluster_id>
-
Train Isolations on MC
To train the shower shape correction for MC use
training/train_qRC_MC.py
python train_qRC_I_MC.py -c <config_file_(PhotonIso/ChargedIsos)>.yaml -N <n_evts> -E <EB/EE> -B <cluster_profile> -i <cluster_id>
After validating the initial training, one can train the final single regressors that can be used to apply the corrections to the simulation in production. To do so, follow these steps:
-
Train the final shower shape corrections
To train the final shower shape correction use
training/train_final_Reg_SS.py
python train_final_Reg_SS.py -c <config_file_ShowerShapes>.yaml -N <n_evts> -E <EB/EE> -B <ipython_cluster_profile> -i <cluster_id> -n 21
-
Train final charged Iso corrections
To train the final correction for the charged isolations use
training/train_final_Reg_Iso.py
python train_final_Reg_Iso.py -c <config_file_(ChargedIsos)>.yaml -N <n_evts> -E <EB/EE> -B <ipython_cluster_profile> -i <cluster_id> -n 21
-
Train final photon Iso corrections
To train the final correction for the photon isolation use
training/train_final_Reg_Iso.py
python train_final_Reg_Iso.py -c <config_file_(PhotonIso)>.yaml -N <n_evts> -E <EB/EE> -B <ipython_cluster_profile> -i <cluster_id> -n 21
The only difference between the command for charged and photon Iso are the config files
In general the config files for the training for data and simulation for the initial and final training have the same format. Examples can be found in examples
. The following keywords should be used
Keyword | Used for |
---|---|
Dataframes | (data/mc)_(EB/EE) for the dataframe for data/MC in EB/EE |
variables | The list of variables to be corrected. The order here is important |
year | The year of data-taking the relevant datasets are from |
workDir | the path to the working dir, dataframes and weightsDir need to be in there |
weightsDir | directory to store the weights. Create before training |
outDir | directory to store the final weight. Create before training |