From Single-Hospital to Multi-Centre Applications: Enhancing the Generalisability of Deep Learning Models for Adverse Event Prediction in the ICU

Deep learning (DL) can aid doctors in detecting worsening patient states early, affording them time to react and prevent bad outcomes. While DL-based early warning models usually work well in the hospitals they were trained for, they tend to be less reliable when applied at new hospitals. This makes it difficult to deploy them at scale. Using carefully harmonised intensive care data from four data sources across Europe and the US (totalling 334,812 stays), we systematically assessed the reliability of DL models for three common adverse events: death, acute kidney injury (AKI), and sepsis. We tested whether using more than one data source and/or explicitly optimising for generalisability during training improves model performance at new hospitals. We found that models achieved high AUROC for mortality (0.838-0.869), AKI (0.823-0.866), and sepsis (0.749-0.824) at the training hospital. As expected, performance dropped at new hospitals, sometimes by as much as -0.200. Using more than one data source for training mitigated the performance drop, with multi-source models performing roughly on par with the best single-source model. This suggests that as data from more hospitals become available for training, model robustness is likely to increase, lower-bounding robustness with the performance of the most applicable data source in the training data. Dedicated methods promoting generalisability did not noticeably improve performance in our experiments.

[1]  S. Cosgrove,et al.  Prospective, multi-site study of patient outcomes after implementation of the TREWS machine learning-based early warning system for sepsis , 2022, Nature Medicine.

[2]  Stephanie L. Hyland,et al.  Looking for Out-of-Distribution Environments in Multi-center Critical Care Data , 2022, 2205.13398.

[3]  L. Celi,et al.  Systematic Review and Comparison of Publicly Available ICU Data Sets—A Decision Guide for Clinicians and Data Scientists , 2022, Critical care medicine.

[4]  M. Cord,et al.  Fishr: Invariant Gradient Variances for Out-of-distribution Generalization , 2021, ICML.

[5]  Gunnar Rätsch,et al.  HiRID-ICU-Benchmark - A Comprehensive Machine Learning Benchmark on High-resolution ICU Data , 2021, NeurIPS Datasets and Benchmarks.

[6]  Nicolai Meinshausen,et al.  ricu: R’s interface to intensive care data , 2021, GigaScience.

[7]  Karsten M. Borgwardt,et al.  Predicting sepsis in multi-site, multi-national intensive care cohorts using deep learning , 2021, ArXiv.

[8]  Finale Doshi-Velez,et al.  Generalization in Clinical Prediction Models: The Blessing and Curse of Measurement Indicator Variables , 2021, Critical care explorations.

[9]  Marzyeh Ghassemi,et al.  An empirical framework for domain generalization in clinical settings , 2021, CHIL.

[10]  G. Clermont,et al.  Sharing ICU Patient Data Responsibly Under the Society of Critical Care Medicine/European Society of Intensive Care Medicine Joint Data Science Collaboration: The Amsterdam University Medical Centers Database (AmsterdamUMCdb) Example* , 2021, Critical care medicine.

[11]  Daguang Xu,et al.  Federated learning improves site performance in multicenter deep learning without data sharing , 2021, J. Am. Medical Informatics Assoc..

[12]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[13]  Micah J. Sheller,et al.  The future of digital health with federated learning. , 2020, NPJ digital medicine.

[14]  Micah J. Sheller,et al.  The future of digital health with federated learning , 2020, npj Digital Medicine.

[15]  Stephanie L. Hyland,et al.  Early prediction of circulatory failure in the intensive care unit using machine learning , 2020, Nature Medicine.

[16]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, International Conference on Machine Learning.

[17]  Shamim Nemati,et al.  Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 , 2019, 2019 Computing in Cardiology (CinC).

[18]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[19]  Mustafa Suleyman,et al.  Key challenges for delivering clinical impact with artificial intelligence , 2019, BMC Medicine.

[20]  Jonathan A. C. Sterne,et al.  Use of machine learning to analyse routinely collected intensive care unit data: a systematic review , 2019, Critical Care.

[21]  D. Timmerman,et al.  Untapped potential of multicenter studies: a review of cardiovascular risk prediction models revealed inappropriate analyses and wide variation in reporting , 2019, Diagnostic and Prognostic Research.

[22]  Thomas Hofmann,et al.  Machine learning for real-time prediction of complications in critical care: a retrospective study. , 2018, The Lancet. Respiratory medicine.

[23]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[24]  Matthew M. Churpek,et al.  The Development of a Machine Learning Inpatient Acute Kidney Injury Prediction Model* , 2018, Critical care medicine.

[25]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[26]  Yongxin Yang,et al.  Learning to Generalize: Meta-Learning for Domain Generalization , 2017, AAAI.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[29]  R. Bellomo,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). , 2016, JAMA.

[30]  M. J. van der Laan,et al.  Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. , 2015, The Lancet. Respiratory medicine.

[31]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[32]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[33]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[34]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[35]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.