Big data hurdles in precision medicine and precision public health

BackgroundNowadays, trendy research in biomedical sciences juxtaposes the term ‘precision’ to medicine and public health with companion words like big data, data science, and deep learning. Technological advancements permit the collection and merging of large heterogeneous datasets from different sources, from genome sequences to social media posts or from electronic health records to wearables. Additionally, complex algorithms supported by high-performance computing allow one to transform these large datasets into knowledge. Despite such progress, many barriers still exist against achieving precision medicine and precision public health interventions for the benefit of the individual and the population.Main bodyThe present work focuses on analyzing both the technical and societal hurdles related to the development of prediction models of health risks, diagnoses and outcomes from integrated biomedical databases. Methodological challenges that need to be addressed include improving semantics of study designs: medical record data are inherently biased, and even the most advanced deep learning’s denoising autoencoders cannot overcome the bias if not handled a priori by design. Societal challenges to face include evaluation of ethically actionable risk factors at the individual and population level; for instance, usage of gender, race, or ethnicity as risk modifiers, not as biological variables, could be replaced by modifiable environmental proxies such as lifestyle and dietary habits, household income, or access to educational resources.ConclusionsData science for precision medicine and public health warrants an informatics-oriented formalization of the study design and interoperability throughout all levels of the knowledge inference process, from the research semantics, to model development, and ultimately to implementation.

[1]  David Sands,et al.  Differential Privacy , 2015, POPL.

[2]  Floor Sieverink,et al.  Internet of Things & Personalized Healthcare. , 2016, Studies in health technology and informatics.

[3]  Chris Showell,et al.  Risk and the Internet of Things: Damocles, Pythia, or Pandora? , 2016, Studies in health technology and informatics.

[4]  Dipak Kalra,et al.  Clinical information modeling processes for semantic interoperability of electronic health records: systematic review and inductive analysis , 2015, J. Am. Medical Informatics Assoc..

[5]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[6]  M. J. van der Laan,et al.  Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. , 2015, The Lancet. Respiratory medicine.

[7]  P. Canonico,et al.  Diagnostic accuracy of HLA-B*57:01 screening for the prediction of abacavir hypersensitivity and clinical utility of the test: a meta-analytic review. , 2014, Pharmacogenomics.

[8]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[9]  Christopher M. Danforth,et al.  Instagram photos reveal predictive markers of depression , 2016, EPJ Data Science.

[10]  Simon De Lusignan,et al.  Using ontologies to improve semantic interoperability in health data , 2015, BMJ Health & Care Informatics.

[11]  R. Niven,et al.  Asthma Phenotypes and Endotypes: Implications for Personalised Therapy , 2017, BioDrugs.

[12]  Steven J. Lade,et al.  Generalized modeling of empirical social‐ecological systems , 2015, 1503.02846.

[13]  Isaac S Kohane,et al.  Ten things we have to do to achieve precision medicine , 2015, Science.

[14]  Iain Buchan,et al.  The Study Team for Early Life Asthma Research (STELAR) consortium ‘Asthma e-lab’: team science bringing data, methods and investigators together , 2015, Thorax.

[15]  Roberto M. Lang,et al.  Integrated analyses of gene expression and genetic association studies in a founder population , 2016, Human molecular genetics.

[16]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[17]  Alberto Suárez,et al.  Globally Optimal Fuzzy Decision Trees for Classification and Regression , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Muin J. Khoury,et al.  Will Precision Medicine Improve Population Health , 2017 .

[19]  W Katherine Yih,et al.  Intussusception risk after rotavirus vaccination in U.S. infants. , 2014, The New England journal of medicine.

[20]  Lluís Codina,et al.  Tesauros y ontologías en sistemas de información documental , 2011 .

[21]  Yi Wang,et al.  Differential Privacy Preserving in Big Data Analytics for Connected Health , 2016, Journal of Medical Systems.

[22]  R. Barker,et al.  Is precision medicine the future of healthcare? , 2017, Personalized medicine.

[23]  James Geller,et al.  Article in Press G Model Artificial Intelligence in Medicine a Comparative Analysis of the Density of the Snomed Ct Conceptual Content for Semantic Harmonization , 2022 .

[24]  R. Green,et al.  Genetic testing for Alzheimer's disease and its impact on insurance purchasing behavior. , 2005, Health affairs.

[25]  Susan A Matney Semantic interoperability: The good, the bad, and the ugly. , 2016, Nursing.

[26]  R. Green,et al.  Disclosure of Personalized Rheumatoid Arthritis Risk Using Genetics, Biomarkers, and Lifestyle Factors to Motivate Health Behavior Improvements: A Randomized Controlled Trial , 2018, Arthritis care & research.

[27]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[28]  J. Flanagan,et al.  Epigenome-wide association studies (EWAS): past, present, and future. , 2015, Methods in molecular biology.

[29]  Natasa Przulj,et al.  Integrative methods for analyzing big data in precision medicine , 2016, Proteomics.

[30]  S. Schroeder Shattuck Lecture. We can do better--improving the health of the American people. , 2007, The New England journal of medicine.

[31]  Lars Bolund,et al.  Haplotype frequencies in a sub-region of chromosome 19q13.3, related to risk and prognosis of cancer, differ dramatically between ethnic groups , 2009, BMC Medical Genetics.

[32]  G. Davey Smith,et al.  Genetic epidemiology and Mendelian randomization for informing disease therapeutics: Conceptual and methodological challenges , 2017, bioRxiv.

[33]  N. Schork Personalized medicine: Time for one-person trials , 2015, Nature.

[34]  Brian W. Powers,et al.  The digital phenotype , 2015, Nature Biotechnology.

[35]  Rongxin Zhang,et al.  Epigenetics: the language of the cell? , 2014, Epigenomics.

[36]  Muin J Khoury,et al.  From public health genomics to precision public health: a 20-year journey , 2017, Genetics in Medicine.

[37]  L. Gottlieb,et al.  Uses and Misuses of Patient- and Neighborhood-level Social Determinants of Health Data. , 2018, The Permanente journal.

[38]  V. Asokan,et al.  Public health and precision medicine share a goal , 2017, Journal of evidence-based medicine.

[39]  O. Thas,et al.  Next‐generation technologies and data analytical approaches for epigenomics , 2014, Environmental and molecular mutagenesis.

[40]  Patrick B. Ryan,et al.  Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data , 2018, J. Am. Medical Informatics Assoc..

[41]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[42]  Christopher J. L. Murray,et al.  Inequalities in Life Expectancy Among US Counties, 1980 to 2014: Temporal Trends and Key Drivers , 2017, JAMA internal medicine.

[43]  Patrick B. Ryan,et al.  Validation of a common data model for active safety surveillance research , 2012, J. Am. Medical Informatics Assoc..

[44]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[45]  D. Roden,et al.  Phenome-Wide Association Studies as a Tool to Advance Precision Medicine. , 2016, Annual review of genomics and human genetics.

[46]  Daniel L. Rubin,et al.  Revealing cancer subtypes with higher-order correlations applied to imaging and omics data , 2017, BMC Medical Genomics.

[47]  Mark J van der Laan,et al.  Super Learning: An Application to the Prediction of HIV-1 Drug Resistance , 2007, Statistical applications in genetics and molecular biology.

[48]  Stephen G Oliver,et al.  Alzheimer's as a Systems-Level Disease Involving the Interplay of Multiple Cellular Networks. , 2016, Methods in molecular biology.

[49]  Ju Han Kim,et al.  Health Avatar: An Informatics Platform for Personal and Private Big Data , 2014, Healthcare informatics research.

[50]  Nicolette de Keizer,et al.  Semantic Integration of Patient Data and Quality Indicators Based on openEHR Archetypes , 2012, ProHealth/KR4HC.

[51]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[52]  Jacqueline Merrill,et al.  Converging and diverging needs between patients and providers who are collecting and using patient-generated health data: an integrative review , 2018, J. Am. Medical Informatics Assoc..

[53]  Paul A Harris,et al.  ResearchMatch: A National Registry to Recruit Volunteers for Clinical Research , 2012, Academic medicine : journal of the Association of American Medical Colleges.

[54]  Richard T. Barfield,et al.  Reclassification of genetic-based risk predictions as GWAS data accumulate , 2016, Genome Medicine.

[55]  R. Ned Genetic Testing for CYP450 Polymorphisms to Predict Response to Clopidogrel: current evidence and test availability , 2010, PLoS currents.

[56]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[57]  Sandy Weininger,et al.  Levels of conceptual interoperability model for healthcare framework for safe medical device interoperability , 2015, 2015 IEEE Symposium on Product Compliance Engineering (ISPCE).

[58]  Isaac S Kohane,et al.  Deeper, longer phenotyping to accelerate the discovery of the genetic architectures of diseases , 2014, Genome Biology.

[59]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[60]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[61]  Editorial overview: Molecular and genetic bases of disease: Enter the post-GWAS era. , 2015, Current opinion in genetics & development.

[62]  Nancy Fullman,et al.  Mapping under-5 and neonatal mortality in Africa, 2000–15: a baseline analysis for the Sustainable Development Goals , 2017, The Lancet.

[63]  Luciano Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[64]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[65]  Thomas Hillman,et al.  Adolescents' presentation of food in social media: An explorative study , 2016, Appetite.

[66]  Katherine Liu,et al.  Women’s involvement in clinical trials: historical perspective and future implications , 2016, Pharmacy practice.

[67]  J. Wenny Rahayu,et al.  Ontology driven semantic profiling and retrieval in medical information systems , 2009, J. Web Semant..

[68]  Giovanni Parmigiani,et al.  Searching for differentially expressed gene combinations , 2005, Genome Biology.

[69]  Theodoros N. Arvanitis Semantic Interoperability in Healthcare , 2014, ICIMTH.

[70]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[71]  Karl Atkin,et al.  Why ethnic minority groups are under-represented in clinical trials: a review of the literature. , 2004, Health & social care in the community.

[72]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[73]  Manolis Tsiknakis,et al.  Designing a digital patient avatar in the context of the MyHealthAvatar project initiative , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[74]  P. Bizouarn Kenneth J. Rothman and multicausality in epidemiology. , 2012, Revue d'epidemiologie et de sante publique.

[75]  D. Longo,et al.  Precision medicine--personalized, problematic, and promising. , 2015, The New England journal of medicine.

[76]  A. Hofman,et al.  Meta-GWAS Accuracy and Power (MetaGAP) Calculator Shows that Hiding Heritability Is Partially Due to Imperfect Genetic Correlations across Studies , 2016, bioRxiv.

[77]  Mauro Giacomini,et al.  Combining macula clinical signs and patient characteristics for age-related macular degeneration diagnosis: a machine learning approach , 2015, BMC Ophthalmology.

[78]  Peter N. Robinson,et al.  Deep phenotyping for precision medicine , 2012, Human mutation.

[79]  Carole A. Goble,et al.  Using a suite of ontologies for preserving workflow-centric research objects , 2015, J. Web Semant..

[80]  T. Nikolopoulos,et al.  Meniere’s disease: Still a mystery disease with difficult differential diagnosis , 2011, Annals of Indian Academy of Neurology.

[81]  E. Cohen,et al.  We Can Do Better — Improving the Health of the American People , 2008 .

[82]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[83]  Jennifer L. Skeem,et al.  RISK, RACE, AND RECIDIVISM: PREDICTIVE BIAS AND DISPARATE IMPACT*: RISK, RACE, AND RECIDIVISM , 2016 .

[84]  Andrew B. Collmus,et al.  A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research. , 2016, Psychological methods.

[85]  Cathryn M. Delude Deep phenotyping: The details of disease , 2015, Nature.

[86]  David Veenstra,et al.  Precision Medicine, Genome Sequencing, and Improved Population Health. , 2018, JAMA.

[87]  Johan Gustav Bellika,et al.  Semantic Interoperability in Clinical Decision Support Systems: A Systematic Review , 2015, MedInfo.

[88]  H. Rehm Evolving health care through personal genomics , 2017, Nature Reviews Genetics.

[89]  James T. Morton,et al.  Microbiome-wide association studies link dynamic microbial consortia to disease , 2016, Nature.

[90]  Carl A. Gunter,et al.  Privacy in the Genomic Era , 2014, ACM Comput. Surv..

[91]  Xi Zhang,et al.  Automated Inference on Criminality using Face Images , 2016, ArXiv.

[92]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[93]  Damas P. Gruska Differential Privacy and Security , 2014, CS&P.

[94]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[95]  Atul J. Butte,et al.  An Environment-Wide Association Study (EWAS) on Type 2 Diabetes Mellitus , 2010, PloS one.

[96]  Samantha A. Adams,et al.  Precision medicine: opportunities, possibilities, and challenges for patients and providers , 2016, J. Am. Medical Informatics Assoc..

[97]  Iain E. Buchan,et al.  A unified modeling approach to data-intensive healthcare , 2009, The Fourth Paradigm.

[98]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[99]  K. Bourzac Participation: Power to the patients , 2016, Nature.

[100]  Jennifer L. Skeem,et al.  Risk, Race, & Recidivism: Predictive Bias and Disparate Impact , 2016 .

[101]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.