Translational medicine in the Age of Big Data

Abstract The ability to collect, store and analyze massive amounts of molecular and clinical data is fundamentally transforming the scientific method and its application in translational medicine. Collecting observations has always been a prerequisite for discovery, and great leaps in scientific understanding are accompanied by an expansion of this ability. Particle physics, astronomy and climate science, for example, have all greatly benefited from the development of new technologies enabling the collection of larger and more diverse data. Unlike medicine, however, each of these fields also has a mature theoretical framework on which new data can be evaluated and incorporated—to say it another way, there are no ‘first principals’ from which a healthy human could be analytically derived. The worry, and it is a valid concern, is that, without a strong theoretical underpinning, the inundation of data will cause medical research to devolve into a haphazard enterprise without discipline or rigor. The Age of Big Data harbors tremendous opportunity for biomedical advances, but will also be treacherous and demanding on future scientists.

[1]  Antonio Peón,et al.  Predicting the Reliability of Drug-target Interaction Predictions with Maximum Coverage of Target Space , 2017, Scientific Reports.

[2]  S. Lewin,et al.  The end of AIDS: HIV infection as a chronic disease , 2013, The Lancet.

[3]  I. Bertolozzi,et al.  Combination Therapy With Ceftriaxone and Lansoprazole, Acquired Long QT Syndrome, and Torsades de Pointes Risk. , 2017, Journal of the American College of Cardiology.

[4]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[5]  E F Cook,et al.  Performance of tests of significance based on stratification by a multivariate confounder score or by a propensity score. , 1989, Journal of clinical epidemiology.

[6]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[7]  Nicholas P. Tatonetti,et al.  An Integrative Data Science Pipeline to Identify Novel Drug Interactions that Prolong the QT Interval , 2016, Drug Safety.

[8]  E. Wigner The Unreasonable Effectiveness of Mathematics in the Natural Sciences (reprint) , 1960 .

[9]  Olivier Elemento,et al.  Effective Combination Therapies for B-cell Lymphoma Predicted by a Virtual Disease Model. , 2017, Cancer research.

[10]  D. Hinds,et al.  Identification of 15 genetic loci associated with risk of major depression in individuals of European descent , 2016, Nature Genetics.

[11]  Marcia McNutt,et al.  Data sharing , 2016, Science.

[12]  Zhiyong Lu,et al.  A survey of current trends in computational drug repositioning , 2016, Briefings Bioinform..

[13]  L. Cardon,et al.  Use of genome-wide association studies for drug repositioning , 2012, Nature Biotechnology.

[14]  R. Barro,et al.  A New Data Set of Educational Attainment in the World, 1950-2010 , 2010 .

[15]  Cassandra Wolos Pattanayak,et al.  Métodos de puntuación de propensión para crear una distribución equilibrada de las covariables en los estudios observacionales , 2011 .

[16]  Benjamin S. Glicksberg,et al.  Identification of type 2 diabetes subgroups through topological analysis of patient similarity , 2015, Science Translational Medicine.

[17]  Farid Neema,et al.  Data sharing , 1998 .

[18]  Tal Lorberbaum,et al.  Coupling Data Mining and Laboratory Experiments to Discover Drug Interactions Causing QT Prolongation. , 2016, Journal of the American College of Cardiology.

[19]  R. Altman,et al.  Data-Driven Prediction of Drug Effects and Interactions , 2012, Science Translational Medicine.

[20]  J. Avorn,et al.  High-dimensional Propensity Score Adjustment in Studies of Treatment Effects Using Health Care Claims Data , 2009, Epidemiology.

[21]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[22]  Robert M. Groves,et al.  RESEARCH ON SURVEY DATA QUALITY , 1987 .

[23]  Chuang Liu,et al.  Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference , 2012, PLoS Comput. Biol..

[24]  George Hripcsak,et al.  Birth month affects lifetime disease risk: a phenome-wide method , 2015, J. Am. Medical Informatics Assoc..

[25]  Donald B Rubin,et al.  [Propensity score methods for creating covariate balance in observational studies]. , 2011, Revista espanola de cardiologia.

[26]  Zhiyong Lu,et al.  Pathway-based drug repositioning using causal inference , 2013, BMC Bioinformatics.

[27]  David Madigan,et al.  Multiple Self‐Controlled Case Series for Large‐Scale Longitudinal Observational Databases , 2013, Biometrics.

[28]  James Y. Zou Analysis of protein-coding genetic variation in 60,706 humans , 2015, Nature.

[29]  Patrick F. Sullivan,et al.  Quantifying prion disease penetrance using large population control cohorts , 2016, Science Translational Medicine.

[30]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[31]  R. Altman,et al.  Detecting Drug Interactions From Adverse‐Event Reports: Interaction Between Paroxetine and Pravastatin Increases Blood Glucose Levels , 2011, Clinical pharmacology and therapeutics.

[32]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[33]  J. Frankovich,et al.  Evidence-based medicine in the EMR era. , 2011, The New England journal of medicine.

[34]  Alexander A. Morgan,et al.  Discovery and Preclinical Validation of Drug Indications Using Compendia of Public Gene Expression Data , 2011, Science Translational Medicine.

[35]  Alexander A. Morgan,et al.  Computational Repositioning of the Anticonvulsant Topiramate for Inflammatory Bowel Disease , 2011, Science Translational Medicine.