Skip to content

Analysis of DHFR deep mutational scanning data collected in varied TYMS genetic backgrounds

License

Notifications You must be signed in to change notification settings

reynoldsk/dhfr-tyms-epistasis

Repository files navigation

********************************************************************************
This directory contains the data and python notebooks necessary to reproduce the
analysis in:

Nguyen T.N., Ingle C, Thompson S.M. and Reynolds K.A. "The genetic lanscape of a
metabolic interaction"


Copyright (C) 2022 Kimberly A. Reynolds
This collection of python notebooks is free software distributed under the BSD
3-clause license, please see the file LICENSE for details
********************************************************************************

This code is written in Python 3, and makes use of numpy (v.1.20.1 was used in development),
matplotlib (3.3.4), pandas (1.2.4), scipy (1.6.2), pickle (0.7.5), scikit-learn (0.24.1),
biopython (1.78), and seaborn (0.11.1). It has been tested on mac OS13, but has no special dependencies
on hardware or OS. 

It is presented as a series of four Jupyter notebooks.
We suggest installing the Anaconda distribution of python for scientific computing as
one route to obtaining the necessary dependencies.

Once Anaconda has been installed, the Jupyter notebooks can be run following standard usage. Clicking 
through each cell will repeat the calculations and generate the figures in the main text of 
Nguyen et al. Most notebooks are expected to run on the order of a few minutes on a standard laptop 
computer, however the global model fit in notebook 4 is expected to take order hours. For this reason, 
we also provide the full model fit (for import) in a pickle database. 

The code provides an analysis of our DHFR deep mutational scanning data in three TYMS
backgrounds (WT, Q33S, R166Q). The analysis starts with "counts" of each mutant for
specific combinations of time point, replicate, TYMS background and DHFR sublibrary
as measured during a selection time course by next generation sequencing. The code
calculates growth rate and epistasis from these data.

The notebooks also build a model for how DHFR and TYMS biochemical parameters relate
to E. coli growth rate, use a limited subset of the experimental data to fit the model,
and subsequently examine the correspondence between the model and the remiander of the data.

Contents:

        .gitignore              specifies intentionally untracked files that git should ignore
                                (not track during commits)

        README                  this file

        LICENSE                 BSD 3-clause license

        Figs/                   directory of output figures generated by the python notebooks

        Input/                  non-sequencing based input data: PDB files, plate reader data
                                kinetic parameters for experimentally characterized DHFR and TYMS mutants

        Output/                 Output data in the form of tab-delimited text files and pickle files
                                Includes growth rates, epistasis, and best-fit model parameters

        Tables/                 Kinetic parameters for experimentally characterized DHFR and TYMS mutants
                                (formatted as excel tables)

        210514_mutantCounts/    The mutant counts used for inferring growth rate, and then epistasis
                                More details on the generation of the counts files can be found in
                                210514_mutantCounts/README

        1_KGModel.ipynb         First jupyter notebook in the series. This notebook constructs a biochemistry
                                to phenotype model, by approximating the paired oxidation/reduction reaction
                                (catalyzed by DHFR/TYMS) as a Koshland-Goldbeter-like cycle

        2_DMSGrowthRates.ipynb  Second jupyter notebook in the series. This notebook reads in the raw counts data,
                                calculates relative growth rates for all mutations in the experiment, and examines
                                the distribution of fitness effect (DFE) across TYMS background.

        3_Epistasis.ipynb       Third jupyter notebook in the series. This notebook read the growth rates from
                                2_DMSGrowthRates, and computes epistasis. The code assesses the statistical
                                significance of individual epistatic measurements, and clusters DHFR positions
                                by epistatic profile so that we can identify potential structural patterns.

        4_ModelAndDMSData.ipynb Fourth and final jupyter notebook in the series. Exmaines the correspondence between
                                the biochemistry-to-phenotype model (as developed in 1_KGModel) and the complete
                                dataset.

        plot_style.py           small python module used to set figure appearance, create grids on heatmaps

About

Analysis of DHFR deep mutational scanning data collected in varied TYMS genetic backgrounds

Resources

License

Stars

Watchers

Forks

Packages

No packages published