This repository contains slides, code, and supplementary materials for the "The results in Table 1 don’t seem to correspond to those in Figure 2" talk I gave at
- DataTech, March 2019 in Edinburgh, Scotland [Slideshow]
- University of Cincinnati, April 2019 in Cincinccati, OH, USA [Slideshow]
- Connect IPSDS, June 2019 in Mannheim, Germany [Slideshow]
- ISCB RSG Turkey, August 2019 as a webinar [Slideshow]
- Pydata, May 2020 in Edinburgh, UK, virtual meetup [Slideshow]
The talk was revised a bit each time.
For data analysis to be reproducible, the data and code should be assembled in a way such that results (e.g. tables and figures) can be re-created. While the scientific community is by and large in agreement that reproducibility is a minimal standard by which data analyses should be evaluated, and a myriad of software tools for reproducible computing exist, it is still not trivial to reproduce someone's (sometimes your own!) results without fiddling with unavailable analysis data, external dependencies, missing packages, out of date software, etc. In this talk, we present good, better, and best workflows for reproducibility that touch on everything from data storage, cleaning, analysis, to communication of final results.
Title credit goes to Karl Broman: https://www.biostat.wisc.edu/~kbroman/presentations/cmp2018.pdf
- The references folder contains the papers mentioned in the talk.
- Code used for generating figures and output in the talk are in the scripts folder.
- Images used in the talk are saved in the images folder.