Skip to content
forked from nekrut/BMMB554

Introduction to data driven life sciences

Notifications You must be signed in to change notification settings

cutsort/BMMB554

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Place and Time

Boucke Building 216 | Monday, Wednesday 2:30pm - 3:45pm

Instructor

Anton Nekrutenko
[email protected]
Wartik 505
Office hours by appointment only

❗ When contacting instructor use the above e-mail and include "BMMB554" in the subject line (simply click on e-mail address).

Course logistics

This course does not use Canvas. Canvas is a convoluted system with too many features and undefined purpose. Instead, this course is served from GitHub. Each new lecture will be created as a new entry under "Lectures" section of the website.

❗ Do not contact me through Canvas! I will not check my Inbox there. Instead, contact me via e-mail as described above.

Grading

  • Attendance (25%)
  • Homework (30%)
  • Final Project (45%)

A short homework will be given once a week. We will discuss project requirements before the spring break.

Topics

We will cover a breadth of topics. The course will be divided into several blocks:

Block 1: A toolset for genomics

# What Why
1.1 Introduction to this course Who you are and what you want to accomplish
1.2 Jupyter and GitHub Throughout this course Jupyter will be our primary framework for data analysis and GitHub will be used for version control
1.3 Python Understanding transactional Python for data analysis
1.4 NumPy and SciPy Libraries for numerical magic
1.5 MatPlotLib Plotting data in Python
1.6 Pandas and SQL Operating on large datasets in Python
1.7 Galaxy Data processing as scale

Block 2: Quantitative refresher

# What Why
2.1 Fundamentals Probability, Descriptive statistics, Correlation analysis, and Logarithms
2.2 Statistical analyses Distributions, Sampling, Significance, Permutation, Bayes Theorem
2.3 Visualization Useful versus meaningless
2.4 Linear algebra Matrix operations, Eigenvalues and Eigenvectors
2.5 Discrete data and modeling Understanding the data and going upward
2.6 Clustering Stratifying the data
2.7 Testing How to test your hypotheses
2.8 Multivariate analysis Finding association among multiple variables

Block 3: Sources and types of genomic data

# What Why
3.1 DNA (and RNA) sequencing From Sanger to Nanopores
3.2 Variation Finding and interpreting genetic differences
3.3 Transcriptomics Measuring gene expression
3.4 Transcriptomics Measuring shapes
3.5 DNA/Protein interactions Assessing gene regulation and genome architecture
3.6 Metagenomics Analysis of complex mixtures
3.7 About counts NGS data is count data. There are common themes in read count analysis

Block 4: Computational biology basics

# What Why
4.1 Alignment Fundamental concepts, Global and local alignment
4.2 Aligning many sequences quickly Mapping in the age of billion-read datasets
4.3 Bloom filters Searching in the age of Exobyte databases
4.4 Assembly Reconstructing genomes and transcriptomes

Final Project

We will get into specifics of the final project before spring break.

ECoS Teaching Statement

In an examination setting, unless the instructor gives explicit prior instructions to the contrary, violations of academic integrity shall consist of any attempt to receive assistance from written or printed aids, from any person or papers or electronic devices, or of any attempt to give assistance, whether the student doing so has completed his or her own work or not. Other violations include, but are not limited to, any attempt to gain an unfair advantage in regard to an examination, such as tampering with a graded exam or claiming another's work to be one's own. Other assessments (including ANGEL-administered quizzes and assessments as well as homework assignments) are expected to represent your own independent work unless specifically stated otherwise. Failure to comply will lead to sanctions against the student in accordance with the Policy on Academic Integrity in the Eberly College of Science. The Eberly College of Science Code of Mutual Respect and Cooperation (www.science.psu.edu/climate/Code-of-Mutual-Respect-final.pdf) embodies the values that we hope our faculty, staff, and students possess and will endorse to make The Eberly College of Science a place where every individual feels respected and valued, as well as challenged and rewarded. The Eberly College of Science is committed to the academic success of students enrolled in the College's courses and undergraduate programs. When in need of help, students can utilize various College and University wide resources for learning assistance (http://www.science.psu.edu/advising/success). Penn State welcomes students with disabilities into the University's educational programs. If you have a disability-related need for reasonable academic adjustments in this course, contact the Office for Disability Services (ODS) at 814-863-1807 (V/TTY). For further information regarding ODS, please visit the Office for Disability Services Web site at http://equity.psu.edu/ods/.

 In order to receive consideration for course accommodations, you must contact ODS and provide documentation (see the documentation guidelines). If the documentation supports the need for academic adjustments, ODS will provide a letter identifying appropriate academic adjustments. Please share this letter and discuss the adjustments with your instructor as early in the course as possible. You must contact ODS and request academic adjustment letters at the beginning of each semester.

About

Introduction to data driven life sciences

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 53.2%
  • HTML 46.1%
  • Other 0.7%