Rethinking clinical prediction: Why machine learning must consider year of care and feature aggregation

Machine learning for healthcare often trains models on de-identified datasets with randomly-shifted calendar dates, ignoring the fact that data were generated under hospital operation practices that change over time. These changing practices induce definitive changes in observed data which confound evaluations which do not account for dates and limit the generalisability of date-agnostic models. In this work, we establish the magnitude of this problem on MIMIC, a public hospital dataset, and showcase a simple solution. We augment MIMIC with the year in which care was provided and show that a model trained using standard feature representations will significantly degrade in quality over time. We find a deterioration of 0.3 AUC when evaluating mortality prediction on data from 10 years later. We find a similar deterioration of 0.15 AUC for length-of-stay. In contrast, we demonstrate that clinically-oriented aggregates of raw features significantly mitigate future deterioration. Our suggested aggregated representations, when retrained yearly, have prediction quality comparable to year-agnostic models.

[1]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[2]  Peter Szolovits,et al.  Clinical Intervention Prediction and Understanding with Deep Neural Networks , 2017, MLHC.

[3]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[4]  Mihaela van der Schaar,et al.  GAIN: Missing Data Imputation using Generative Adversarial Nets , 2018, ICML.

[5]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[6]  I. Kohane,et al.  Biases in electronic health record data due to processes within the healthcare system: retrospective observational study , 2018, British Medical Journal.

[7]  Peter Szolovits,et al.  Semi-Supervised Biomedical Translation With Cycle Wasserstein Regression GANs , 2018, AAAI.

[8]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[9]  Volker Tresp,et al.  A Solution for Missing Data in Recurrent Neural Networks with an Application to Blood Glucose Prediction , 1997, NIPS.

[10]  Qingxia Chen,et al.  Missing covariate data in medical research: to impute is better than to ignore. , 2010, Journal of clinical epidemiology.

[11]  Anna Rumshisky,et al.  Unfolding physiological state: mortality modelling in intensive care units , 2014, KDD.

[12]  Peter J. Haug,et al.  Exploiting missing clinical data in Bayesian network modeling for predicting medical problems , 2008, J. Biomed. Informatics.

[13]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[14]  Indre Zliobaite,et al.  Learning under Concept Drift: an Overview , 2010, ArXiv.

[15]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[16]  Peter Szolovits,et al.  Predicting Clinical Outcomes Across Changing Electronic Health Record Systems , 2017, KDD.

[17]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.