Session Introduction

During January 2015, President Obama announced the Precision Medicine Initiative [1], strengthening communal efforts to integrate patient-centric molecular, environmental, and clinical “big” data. Such efforts have already improved aspects of clinical management for diseases such as non-small cell lung carcinoma [2], breast cancer [3], and hypertrophic cardiomyopathy [4]. To maintain this track record, it is necessary to cultivate practices that ensure reproducibility as large-scale heterogeneous datasets and databases proliferate. For example, the NIH has outlined initiatives to enhance reproducibility in preclinical research [5], both Science [6] and Nature [7] have featured recent editorials on reproducibility, and several authors have noted the issues of utilizing big data for public health [8], but few methods exist to ensure that big data resources motivated by precision medicine are being used reproducibly. Relevant challenges include: (1) integrative analyses of heterogeneous measurement platforms (e.g. genomic, clinical, quantified self, and exposure data), (2) the tradeoff in making personalized decisions using more targeted (e.g. individual-level) but potentially much noisier subsets Pacific Symposium on Biocomputing 2016

[1]  Rui Chang,et al.  Exploring the Reproducibility of Probabilistic Causal Molecular Network Models> , 2017, PSB.

[2]  Can Zhang,et al.  Data Sharing and Reproducible Clinical Genetic Testing: Successes and Challenges , 2017, PSB.

[3]  Emre Guney,et al.  Reproducible Drug Repurposing: When Similarity Does Not Suffice , 2017, PSB.

[4]  Winston Haynes,et al.  Empowering Multi-Cohort Gene Expression Analysis to Increase Reproducibility , 2016, bioRxiv.

[5]  Gaurav Kaushik,et al.  Graph Theory Approaches for Optimizing Biomedical Data Analysis Using Reproducible Workflows , 2016 .

[6]  S. Hewitt,et al.  Reproducibility , 2019, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[7]  Patrick F. Sullivan,et al.  Quantifying prion disease penetrance using large population control cohorts , 2016, Science Translational Medicine.

[8]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[9]  Olivier Lichtarge,et al.  Repurposing Germline Exomes of the Cancer Genome Atlas Demands a Cautious Approach and Sample-Specific Variant Filtering , 2016, PSB.

[10]  Isaac S. Kohane,et al.  Reproducible and Shareable Quantifications of Pathogenicity , 2016, PSB.

[11]  Russ B. Altman,et al.  Dynamically Evolving Clinical Practices and Implications for Predicting Medical Decisions , 2016, PSB.

[12]  Susan P. Holmes,et al.  Reproducible Research Workflow in R for the Analysis of Personalized Human Microbiome Data , 2016, PSB.

[13]  Chunhua Weng,et al.  Identification of Questionable Exclusion Criteria in Mental Disorder Clinical Trials Using a Medical Encyclopedia , 2016, PSB.

[14]  John P A Ioannidis,et al.  Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations. , 2015, Journal of clinical epidemiology.

[15]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[16]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[17]  John P. A. Ioannidis,et al.  Big data meets public health , 2014, Science.

[18]  John P. A. Ioannidis,et al.  How to Make More Published Research True , 2014, PLoS medicine.

[19]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[20]  Qingpeng Zhang,et al.  These Are Not the K-mers You Are Looking For: Efficient Online K-mer Counting Using a Probabilistic Data Structure , 2013, PloS one.

[21]  Journals unite for reproducibility , 2014, Nature.

[22]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[23]  Heidi L. Rehm,et al.  Disease-targeted sequencing: a cornerstone in the clinic , 2013, Nature Reviews Genetics.

[24]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[25]  Arend Hintze,et al.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.

[26]  S. Miller,et al.  Association of Risk-Reducing Surgery in BRCA1 or BRCA2 Mutation Carriers With Cancer Risk and Mortality , 2012 .

[27]  S. Stanley Young,et al.  Deming, data and observational studies , 2011 .

[28]  N. Girard,et al.  New driver mutations in non-small-cell lung cancer. , 2011, The Lancet. Oncology.

[29]  Victoria Stodden,et al.  The Scientific Method in Practice: Reproducibility in the Computational Sciences , 2010 .

[30]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.