A Knowledge Distillation Ensemble Framework for Predicting Short and Long-term Hospitalisation Outcomes from Electronic Health Records Data

The ability to perform accurate prognosis of patients is crucial for proactive clinical decision making, informed resource management and personalised care. Existing outcome prediction models suffer from a low recall of infrequent positive outcomes. We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission from time-series vital signs and laboratory results obtained within the first 24 hours of hospital admission. The stacked platform comprises two components: a) an unsupervised LSTM Autoencoder that learns an optimal representation of the time-series, using it to differentiate the less frequent patterns which conclude with an adverse event from the majority patterns that do not, and b) a gradient boosting model, which relies on the constructed representation to refine prediction, incorporating static features of demographics, admission details and clinical summaries. The model is used to assess a patient's risk of adversity over time and provides visual justifications of its prediction based on the patient's static features and dynamic signals. Results of three case studies for predicting mortality and ICU admission show that the model outperforms all existing outcome prediction models, achieving PR-AUC of 0.93 (95$%$ CI: 0.878 - 0.969) in predicting mortality in ICU and general ward settings and 0.987 (95$%$ CI: 0.985-0.995) in predicting ICU admission.

[1]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[2]  L. Forni,et al.  NEWS 2 – too little evidence to implement? , 2018, Clinical medicine.

[3]  Hongyue WANG,et al.  Log-transformation and its implications for data analysis , 2014, Shanghai archives of psychiatry.

[4]  J. Finn,et al.  The effect of comorbidities on risk of intensive care readmission during the same hospitalization: a linked data cohort study. , 2009, Journal of critical care.

[5]  Alexey Zaytsev,et al.  Unsupervised anomaly detection for discrete sequence healthcare data , 2020, AIST.

[6]  Rakia Jaziri,et al.  Hybrid approach for Anomaly Detection in Time Series Data , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[7]  Christian S. Jensen,et al.  Outlier Detection for Time Series with Recurrent Autoencoder Ensembles , 2019, IJCAI.

[8]  Guangming Shi,et al.  Real-Time Illegal Parking Detection System Based on Deep Learning , 2017, ICDLT '17.

[9]  Lovekesh Vig,et al.  LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection , 2016, ArXiv.

[10]  Søren Brunak,et al.  Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. , 2020, The Lancet. Digital health.

[11]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[12]  G. Escobar,et al.  Comparison of Early Warning Scoring Systems for Hospitalized Patients With and Without Infection at Risk for In-Hospital Mortality and Transfer to the Intensive Care Unit , 2020, JAMA network open.

[13]  J. Glanz,et al.  Rates and risk factors associated with hospitalization for pneumonia with ICU admission among adults , 2017, BMC Pulmonary Medicine.

[14]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[15]  Michael Gao,et al.  Prospective and External Evaluation of a Machine Learning Model to Predict In-Hospital Mortality of Adults at Time of Admission. , 2020, JAMA network open.

[16]  Jia Zhang,et al.  Beat by beat: Classifying cardiac arrhythmias with recurrent neural networks , 2017, 2017 Computing in Cardiology (CinC).

[17]  Jim Briggs,et al.  Nurse staffing, nursing assistants and hospital mortality: retrospective longitudinal cohort study , 2018, BMJ Quality & Safety.

[18]  Vasa Curcin,et al.  Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study , 2021, BMC Medicine.

[19]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[20]  María N. Moreno García,et al.  Machine Learning Methods for Mortality Prediction of Polytraumatized Patients in Intensive Care Units - Dealing with Imbalanced and High-Dimensional Data , 2014, IDEAL.

[21]  Amir Sadeghipour,et al.  Artificial intelligence in retina , 2018, Progress in Retinal and Eye Research.

[22]  L. Mombaerts,et al.  An interpretable mortality prediction model for COVID-19 patients , 2020, Nature Machine Intelligence.

[23]  L. Lynch Intensive Care National Audit and Research Centre (ICNARC) , 2002 .

[24]  Peter Szolovits,et al.  Predicting ICU Mortality Risk by Grouping Temporal Trends from a Multivariate Panel of Physiologic Measurements , 2016, AAAI.

[25]  Farah E. Shamout,et al.  Deep Interpretable Early Warning System for the Detection of Clinical Deterioration , 2020, IEEE Journal of Biomedical and Health Informatics.

[26]  Xuan Dong,et al.  Clinical outcomes of COVID-19 in Wuhan, China: a large cohort study , 2020, Annals of Intensive Care.

[27]  Romain Pirracchio,et al.  Mortality Prediction in the ICU Based on MIMIC-II Results from the Super ICU Learner Algorithm (SICULA) Project , 2016 .

[28]  Jerrold H. May,et al.  A mixed-ensemble model for hospital readmission , 2016, Artif. Intell. Medicine.

[29]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[30]  David A. Clifton,et al.  Machine Learning for Clinical Outcome Prediction , 2020, IEEE Reviews in Biomedical Engineering.

[31]  C. Winslow,et al.  Multicenter development and validation of a risk stratification tool for ward patients. , 2014, American journal of respiratory and critical care medicine.

[32]  Nino Antulov-Fantulin,et al.  Exploring Interpretable LSTM Neural Networks over Multi-Variable Data , 2019, ICML.

[33]  A. Pickles,et al.  Supplementing the National Early Warning Score (NEWS2) for anticipating early deterioration among patients with COVID-19 infection , 2020, medRxiv.

[34]  Christian S. Jensen,et al.  Outlier Detection for Multidimensional Time Series Using Deep Neural Networks , 2018, 2018 19th IEEE International Conference on Mobile Data Management (MDM).

[35]  David J Wales,et al.  Machine learning landscapes and predictions for patient outcomes , 2017, Royal Society Open Science.

[36]  F. Lu,et al.  Correlation Analysis Between Disease Severity and Inflammation-related Parameters in Patients with COVID-19 Pneumonia , 2020, medRxiv.

[37]  Shamim Nemati,et al.  Detection of Paroxysmal Atrial Fibrillation using Attention-based Bidirectional Recurrent Neural Networks , 2018, KDD.

[38]  G. Corbi,et al.  COVID-19 and the elderly: insights into pathogenesis and clinical decision-making , 2020, Aging Clinical and Experimental Research.

[39]  Tao Guo,et al.  Cardiovascular Implications of Fatal Outcomes of Patients With Coronavirus Disease 2019 (COVID-19) , 2020, JAMA cardiology.

[40]  Renata Vieira,et al.  A Machine Learning Early Warning System: Multicenter Validation in Brazilian Hospitals , 2020, 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS).

[41]  Chien-Te Lee,et al.  The Number of Comorbidities Predicts Renal Outcomes in Patients with Stage 3–5 Chronic Kidney Disease , 2018, Journal of clinical medicine.

[42]  G. Pandey,et al.  Clinical features of COVID-19 mortality: development and validation of a clinical prediction model , 2020, The Lancet Digital Health.

[43]  Melissa Aczon,et al.  Dynamic Mortality Risk Predictions in Pediatric Critical Care Using Recurrent Neural Networks , 2017, ArXiv.

[44]  C. Muyodi,et al.  Risk factors for community-acquired pneumonia among adults in Kenya: a case–control study , 2017, Pneumonia.

[45]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[46]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[47]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[48]  Nan Wu,et al.  An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department , 2020, ArXiv.

[49]  E. Friedman,et al.  Chronic kidney disease in the elderly: evaluation and management. , 2014, Clinical practice.

[50]  A clinical risk score to identify patients with COVID-19 at high risk of critical care admission or death: An observational cohort study , 2020, Journal of Infection.

[51]  Chieh-Chen Wu,et al.  Prediction of sepsis patients using machine learning approach: A meta-analysis , 2019, Comput. Methods Programs Biomed..

[52]  Shehroz S. Khan,et al.  Fall Detection from Thermal Camera Using Convolutional LSTM Autoencoder , 2019, EasyChair Preprints.

[53]  Anna Leontjeva,et al.  Combining Static and Dynamic Features for Multivariate Sequence Classification , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[54]  Mohamed Bader-El-Den,et al.  Early hospital mortality prediction of intensive care unit patients using an ensemble learning approach , 2017, Int. J. Medical Informatics.

[55]  M. Wise,et al.  Artificial neural networks improve early outcome prediction and risk classification in out-of-hospital cardiac arrest patients admitted to intensive care , 2020, Critical Care.

[56]  D. Zhu,et al.  Predicting Clinical Outcomes with Patient Stratification via Deep Mixture Neural Networks. , 2020, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.