Can Deep Clinical Models Handle Real-World Domain Shifts?

The hypothesis that computational models can be reliable enough to be adopted in prognosis and patient care is revolutionizing healthcare. Deep learning, in particular, has been a game changer in building predictive models, thereby leading to community-wide data curation efforts. However, due to the inherent variabilities in population characteristics and biological systems, these models are often biased to the training datasets. This can be limiting when models are deployed in new environments, particularly when there are systematic domain shifts not known a priori. In this paper, we formalize these challenges by emulating a large class of domain shifts that can occur in clinical settings, and argue that evaluating the behavior of predictive models in light of those shifts is an effective way of quantifying the reliability of clinical models. More specifically, we develop an approach for building challenging scenarios, based on analysis of \textit{disease landscapes}, and utilize unsupervised domain adaptation to compensate for the domain shifts. Using the openly available MIMIC-III EHR dataset for phenotyping, we generate a large class of scenarios and evaluate the ability of deep clinical models in those cases. For the first time, our work sheds light into data regimes where deep clinical models can fail to generalize, due to significant changes in the disease landscapes between the source and target landscapes. This study emphasizes the need for sophisticated evaluation mechanisms driven by real-world domain shifts to build effective AI solutions for healthcare.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Yan Liu,et al.  Deep Learning Solutions to Computational Phenotyping in Health Care , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[3]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[4]  Maciej Cytowski,et al.  Computational Modelling of Cancer Development and Growth: Modelling at Multiple Scales and Multiscale Modelling , 2017, Bulletin of Mathematical Biology.

[5]  Roger G. Mark,et al.  Reproducibility in critical care: a mortality prediction case study , 2017, MLHC.

[6]  Milos Hauskrecht,et al.  Learning classification models from multiple experts , 2013, J. Biomed. Informatics.

[7]  HauskrechtMilos,et al.  Learning classification models from multiple experts , 2013 .

[8]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[9]  Yan Liu,et al.  Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets , 2017, ArXiv.

[10]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[11]  Stefano Ermon,et al.  A DIRT-T Approach to Unsupervised Domain Adaptation , 2018, ICLR.

[12]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[13]  Aram Galstyan,et al.  The Information Sieve , 2015, ICML.

[14]  Deepta Rajan,et al.  A Generative Modeling Approach to Limited Channel ECG Classification , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[15]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[16]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[17]  Jenna Wiens,et al.  Leveraging Clinical Time-Series Data for Prediction: A Cautionary Tale , 2018, AMIA.

[18]  R. Campbell,et al.  Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long short-term memory , 2018, bioRxiv.

[19]  Andrew Y. Ng,et al.  Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks , 2017, ArXiv.

[20]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[21]  Andreas Spanias,et al.  Attend and Diagnose: Clinical Time Series Analysis using Attention Models , 2017, AAAI.

[22]  Jinmiao Huang,et al.  An Empirical Evaluation of Deep Learning for ICD-9 Code Assignment using MIMIC-III Clinical Notes , 2018, Comput. Methods Programs Biomed..

[23]  Charles Elkan,et al.  Learning to Diagnose with LSTM Recurrent Neural Networks , 2015, ICLR.

[24]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[25]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[26]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.