-
Notifications
You must be signed in to change notification settings - Fork 0
Analysis of DHFR deep mutational scanning data collected in varied TYMS genetic backgrounds
License
reynoldsk/dhfr-tyms-epistasis
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
******************************************************************************** This directory contains the data and python notebooks necessary to reproduce the analysis in: Nguyen T.N., Ingle C, Thompson S.M. and Reynolds K.A. "The genetic lanscape of a metabolic interaction" Copyright (C) 2022 Kimberly A. Reynolds This collection of python notebooks is free software distributed under the BSD 3-clause license, please see the file LICENSE for details ******************************************************************************** This code is written in Python 3, and makes use of numpy (v.1.20.1 was used in development), matplotlib (3.3.4), pandas (1.2.4), scipy (1.6.2), pickle (0.7.5), scikit-learn (0.24.1), biopython (1.78), and seaborn (0.11.1). It has been tested on mac OS13, but has no special dependencies on hardware or OS. It is presented as a series of four Jupyter notebooks. We suggest installing the Anaconda distribution of python for scientific computing as one route to obtaining the necessary dependencies. Once Anaconda has been installed, the Jupyter notebooks can be run following standard usage. Clicking through each cell will repeat the calculations and generate the figures in the main text of Nguyen et al. Most notebooks are expected to run on the order of a few minutes on a standard laptop computer, however the global model fit in notebook 4 is expected to take order hours. For this reason, we also provide the full model fit (for import) in a pickle database. The code provides an analysis of our DHFR deep mutational scanning data in three TYMS backgrounds (WT, Q33S, R166Q). The analysis starts with "counts" of each mutant for specific combinations of time point, replicate, TYMS background and DHFR sublibrary as measured during a selection time course by next generation sequencing. The code calculates growth rate and epistasis from these data. The notebooks also build a model for how DHFR and TYMS biochemical parameters relate to E. coli growth rate, use a limited subset of the experimental data to fit the model, and subsequently examine the correspondence between the model and the remiander of the data. Contents: .gitignore specifies intentionally untracked files that git should ignore (not track during commits) README this file LICENSE BSD 3-clause license Figs/ directory of output figures generated by the python notebooks Input/ non-sequencing based input data: PDB files, plate reader data kinetic parameters for experimentally characterized DHFR and TYMS mutants Output/ Output data in the form of tab-delimited text files and pickle files Includes growth rates, epistasis, and best-fit model parameters Tables/ Kinetic parameters for experimentally characterized DHFR and TYMS mutants (formatted as excel tables) 210514_mutantCounts/ The mutant counts used for inferring growth rate, and then epistasis More details on the generation of the counts files can be found in 210514_mutantCounts/README 1_KGModel.ipynb First jupyter notebook in the series. This notebook constructs a biochemistry to phenotype model, by approximating the paired oxidation/reduction reaction (catalyzed by DHFR/TYMS) as a Koshland-Goldbeter-like cycle 2_DMSGrowthRates.ipynb Second jupyter notebook in the series. This notebook reads in the raw counts data, calculates relative growth rates for all mutations in the experiment, and examines the distribution of fitness effect (DFE) across TYMS background. 3_Epistasis.ipynb Third jupyter notebook in the series. This notebook read the growth rates from 2_DMSGrowthRates, and computes epistasis. The code assesses the statistical significance of individual epistatic measurements, and clusters DHFR positions by epistatic profile so that we can identify potential structural patterns. 4_ModelAndDMSData.ipynb Fourth and final jupyter notebook in the series. Exmaines the correspondence between the biochemistry-to-phenotype model (as developed in 1_KGModel) and the complete dataset. plot_style.py small python module used to set figure appearance, create grids on heatmaps
About
Analysis of DHFR deep mutational scanning data collected in varied TYMS genetic backgrounds
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published