A Review of Challenges and Opportunities in Machine Learning for Health.

Modern electronic health records (EHRs) provide data to answer clinically meaningful questions. The growing data in EHRs makes healthcare ripe for the use of machine learning. However, learning in a clinical setting presents unique challenges that complicate the use of common machine learning methodologies. For example, diseases in EHRs are poorly labeled, conditions can encompass multiple underlying endotypes, and healthy individuals are underrepresented. This article serves as a primer to illuminate these challenges and highlights opportunities for members of the machine learning community to contribute to healthcare.

[1]  Árton,et al.  Risk factors for injury to women from domestic violence. , 1999, The New England journal of medicine.

[2]  Jacqueline A Pugh,et al.  Audit and feedback and clinical practice guideline adherence: Making feedback actionable , 2006, Implementation science : IS.

[3]  Sherri Rose,et al.  Machine Learning for Prediction in Electronic Health Data. , 2018, JAMA network open.

[4]  Yuval Shahar,et al.  Evaluation of an automated knowledge-based textual summarization system for longitudinal clinical data, in the intensive care domain , 2017, Artif. Intell. Medicine.

[5]  Adler J. Perotte,et al.  Learning probabilistic phenotypes from heterogeneous EHR data , 2015, J. Biomed. Informatics.

[6]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[7]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[8]  D. Bates,et al.  Improving safety with information technology. , 2003, The New England journal of medicine.

[9]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[10]  S. Senn,et al.  Understanding Variation in Sets of N-of-1 Trials , 2016, PloS one.

[11]  John A. Kellum,et al.  Paradigms of acute kidney injury in the intensive care setting , 2018, Nature Reviews Nephrology.

[12]  Wei Chu,et al.  Information Services]: Web-based services , 2022 .

[13]  Suchi Saria,et al.  A Framework for Individualizing Predictions of Disease Trajectories by Exploiting Multi-Resolution Structure , 2015, NIPS.

[14]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[15]  Charles A. Johnson,et al.  Patient-Centered Medicine: Transforming the Clinical Method , 1995 .

[16]  I. Kohane,et al.  Finding the missing link for big biomedical data. , 2014, JAMA.

[17]  Rui Chen,et al.  Promise of personalized omics to precision medicine , 2013, Wiley interdisciplinary reviews. Systems biology and medicine.

[18]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[19]  Judea Pearl,et al.  Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution , 2018, WSDM.

[20]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[21]  Suchi Saria,et al.  Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses , 2016, J. Mach. Learn. Res..

[22]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[23]  Michael Wainberg,et al.  Deep learning in biomedicine , 2018, Nature Biotechnology.

[24]  A. Järvinen,et al.  Telephone consultation cannot replace bedside infectious disease consultation in the management of Staphylococcus aureus Bacteremia. , 2013, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[25]  E. Topol,et al.  Adapting to Artificial Intelligence: Radiologists and Pathologists as Information Specialists. , 2016, JAMA.

[26]  Judith Strymish,et al.  Medicine's uncomfortable relationship with math: calculating positive predictive value. , 2014, JAMA internal medicine.

[27]  Richard Beasley,et al.  External validity of randomised controlled trials in asthma: to whom do the results of the trials apply? , 2006, Thorax.

[28]  D. Marshall,et al.  Capacity building for assessing new technologies: approaches to examining personalized medicine in practice. , 2010, Personalized medicine.

[29]  David K. Vawdrey,et al.  Can Patient Record Summarization Support Quality Metric Abstraction? , 2016, AMIA.

[30]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[31]  Andrew L. Beam,et al.  Adversarial Attacks Against Medical Deep Learning Systems , 2018, ArXiv.

[32]  L. Hood,et al.  Systems cancer medicine: towards realization of predictive, preventive, personalized and participatory (P4) medicine , 2012, Journal of internal medicine.

[33]  Jeffrey A. Golden,et al.  Deep Learning Algorithms for Detection of Lymph Node Metastases From Breast Cancer: Helping Artificial Intelligence Be Seen. , 2017, JAMA.

[34]  J. Brown Patient-centred medicine: Transforming the clinical method , 1998 .

[35]  Douwe Kiela,et al.  Poincaré Embeddings for Learning Hierarchical Representations , 2017, NIPS.

[36]  J. Arthur,et al.  Biomarkers of AKI: a review of mechanistic relevance and potential therapeutic implications. , 2015, Clinical journal of the American Society of Nephrology : CJASN.

[37]  Garnet L Anderson,et al.  Combined analysis of Women's Health Initiative observational and clinical trial data on postmenopausal hormone treatment and cardiovascular disease. , 2006, American journal of epidemiology.

[38]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Rui Chen,et al.  Systems biology: personalized medicine for the future? , 2012, Current opinion in pharmacology.

[40]  Isaac S Kohane,et al.  Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study , 2009, BMJ : British Medical Journal.

[41]  J. Kmenta Mostly Harmless Econometrics: An Empiricist's Companion , 2010 .

[42]  Richard S. Zemel,et al.  Recommender Systems, Missing Data and Statistical Model Estimation , 2011, IJCAI.

[43]  Jennifer G. Robinson,et al.  Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[44]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[45]  R. Saunders,et al.  Best Care at Lower Cost: The Path to Continuously Learning Health Care in America , 2013 .

[46]  Finale Doshi-Velez,et al.  Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis , 2014, Pediatrics.

[47]  Francesco Santini,et al.  Mortality in Multicenter Critical Care Trials: An Analysis of Interventions With a Significant Effect* , 2015, Critical care medicine.

[48]  A. Milstein,et al.  Making Machine Learning Models Clinically Useful. , 2019, JAMA.

[49]  Peter Szolovits,et al.  Semi-Supervised Biomedical Translation With Cycle Wasserstein Regression GANs , 2018, AAAI.

[50]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .

[51]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[52]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[53]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.

[54]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[55]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[56]  S. Ieraci,et al.  Streaming by case complexity: Evaluation of a model for emergency department Fast Track , 2008, Emergency medicine Australasia : EMA.

[57]  Kristian Thorlund,et al.  Demystifying trial networks and network meta-analysis , 2013, BMJ.

[58]  K. Hamberg,et al.  Doubly blind: a systematic review of gender in randomised controlled trials , 2016, Global health action.

[59]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[60]  Zhiyong Lu,et al.  Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health. , 2016, Advances in experimental medicine and biology.

[61]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[62]  J. Kai,et al.  Can machine-learning improve cardiovascular risk prediction using routine clinical data? , 2017, PloS one.

[63]  Rae Woong Park,et al.  Characterizing treatment pathways at scale using the OHDSI network , 2016, Proceedings of the National Academy of Sciences.

[64]  Anna Goldenberg,et al.  Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks , 2019, MLHC.

[65]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[66]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[67]  Pierre Baldi,et al.  Deep Learning in Biomedical Data Science , 2018, Annual Review of Biomedical Data Science.

[68]  Michael V. McConnell,et al.  Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning , 2017, Nature Biomedical Engineering.

[69]  Luca Foschini,et al.  Adversarial Examples for Electrocardiograms , 2019, ArXiv.

[70]  Peter Szolovits,et al.  Clinical Intervention Prediction and Understanding with Deep Neural Networks , 2017, MLHC.

[71]  K. Lum,et al.  To predict and serve? , 2016 .

[72]  Tianxi Cai,et al.  High Throughput Phenotyping for Dimensional Psychopathology in Electronic Health Records , 2018, Biological Psychiatry.

[73]  I. Kohane,et al.  Deep learning predicts tuberculosis drug resistance status from genome sequencing data , 2018, bioRxiv.

[74]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[75]  Gustavo Carneiro,et al.  Detecting hip fractures with radiologist-level performance using deep neural networks , 2017, ArXiv.

[76]  Li Fei-Fei,et al.  Towards Vision-Based Smart Hospitals: A System for Tracking and Monitoring Hand Hygiene Compliance , 2017, MLHC.

[77]  Sung-Bae Cho,et al.  Towards Creative Evolutionary Systems with Interactive Genetic Algorithm , 2002, Applied Intelligence.

[78]  Fan Li,et al.  Causal Inference: A Missing Data Perspective , 2017, 1712.06170.

[79]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[80]  Judea Pearl,et al.  Causal Inference , 2010 .

[81]  Peter Szolovits,et al.  Understanding vasopressor intervention and weaning: risk prediction in a public heterogeneous clinical time series database , 2017, J. Am. Medical Informatics Assoc..

[82]  Eric J Topol,et al.  High-performance medicine: the convergence of human and artificial intelligence , 2019, Nature Medicine.

[83]  M. Howell,et al.  Ensuring Fairness in Machine Learning to Advance Health Equity , 2018, Annals of Internal Medicine.

[84]  J. Beckmann,et al.  Reconciling evidence-based medicine and precision medicine in the era of big data: challenges and opportunities , 2016, Genome Medicine.

[85]  Constantin F. Aliferis,et al.  Predicting dire outcomes of patients with community acquired pneumonia , 2005, J. Biomed. Informatics.

[86]  Yonatan Halpern,et al.  Semi-Supervised Learning for Electronic Phenotyping in Support of Precision Medicine , 2016 .

[87]  Suchi Saria,et al.  Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport , 2018, AISTATS.

[88]  Adler J. Perotte,et al.  Deep Survival Analysis , 2016, MLHC.

[89]  John F. Hurdle,et al.  Measuring diagnoses: ICD code accuracy. , 2005, Health services research.

[90]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[91]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[92]  X. Bonfill,et al.  Hormone replacement therapy for preventing cardiovascular disease in post-menopausal women. , 2005, The Cochrane database of systematic reviews.

[93]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[94]  J. Pearl,et al.  Confounding and Collapsibility in Causal Inference , 1999 .

[95]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[96]  C. Ronco,et al.  The RIFLE criteria and mortality in acute kidney injury: A systematic review. , 2008, Kidney international.

[97]  J. Pearl,et al.  Causal inference , 2011, Twenty-one Mental Models That Can Change Policing.

[98]  et al.,et al.  Assessment of a personalized and distributed patient guidance system , 2017, Int. J. Medical Informatics.

[99]  A. Khwaja KDIGO Clinical Practice Guidelines for Acute Kidney Injury , 2012, Nephron Clinical Practice.

[100]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[101]  Ziad Obermeyer,et al.  Lost in Thought - The Limits of the Human Mind and the Future of Medicine. , 2017, The New England journal of medicine.

[102]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[103]  David Sontag,et al.  Electronic medical record phenotyping using the anchor and learn framework , 2016, J. Am. Medical Informatics Assoc..

[104]  Peter Szolovits,et al.  Predicting intervention onset in the ICU with switching state space models , 2017, CRI.

[105]  Simcha Pollack,et al.  The impact of standardized order sets and intensive clinical case management on outcomes in community-acquired pneumonia. , 2007, Archives of internal medicine.

[106]  Suchi Saria,et al.  A Bayesian Nonparametic Approach for Estimating Individualized Treatment-Response Curves , 2016, ArXiv.

[107]  M. Ghassemi,et al.  Can AI Help Reduce Disparities in General Medical and Mental Health Care? , 2019, AMA journal of ethics.

[108]  Jin Tian,et al.  Graphical Models for Inference with Missing Data , 2013, NIPS.

[109]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[110]  Adler J. Perotte,et al.  Deep Survival Analysis: Nonparametrics and Missingness , 2018, MLHC.

[111]  Yuval Shahar,et al.  An architecture for a continuous, user-driven, and data-driven application of clinical guidelines and its evaluation , 2016, J. Biomed. Informatics.

[112]  Ben Shneiderman,et al.  Improving Healthcare with Interactive Visualization , 2013, Computer.

[113]  Peter Szolovits,et al.  Enabling phenotypic big data with PheNorm , 2018, J. Am. Medical Informatics Assoc..

[114]  Suchi Saria,et al.  Counterfactual Normalization: Proactively Addressing Dataset Shift Using Causal Mechanisms , 2018, UAI.

[115]  R. Charon,et al.  The patient-physician relationship. Narrative medicine: a model for empathy, reflection, profession, and trust. , 2001, JAMA.

[116]  Peter Szolovits,et al.  Predicting Clinical Outcomes Across Changing Electronic Health Record Systems , 2017, KDD.