A machine learning model for predicting ICU readmissions and key risk factors: analysis from a longitudinal health records

Due to high costs, resources and managemant associated with readmission into Intensive Care Units (ICU), it has been a center of clinical research. Previous research successfully identified several common risk factors and proposed a variety of frameworks to predict ICU readmissions, whereas, some studies reported that many risk factors were too specific and/or had limited focus. This study aims to investigate and analyze if the relevance of ICU readmission risk factors may have changed overtime. We used MIMIC-III database with 42,307 ICU stays of 31,749 patients from a US hospital, related to medical services provided from 2001 to 2012. The dataset was initially split into two chronological subsets (2001–2008 and 2008–2012), and then split again into train (70%) and test (30%) datasets. The training datasets were rebalanced through undersampling technique. To identify if the most relevant risk factors changes over time, 13 variables (12 features and one class) were selected and a three-step machine learning approach was executed: (i) Numerical Analysis, to identify overall quantitative changes; (ii) Feature Correlation Value Analysis, to rank the most important risk factors in each subset and compare them to identify any significant changes; and (iii) Classifier Performance Analysis, to identify changes in the risk factors prediction capability, based on the three machine learning algorithms - Multilayer Perceptron, Random Forest and Support Vector Machine. When considering readmission rates, some changes were observed for patients using private insurance (variability of +3.0%) and first admitted in ICU through Medical Intensive Care Unit (−3.1%). Regarding the feature analysis, the two most relevant variables were the same in both datasets, having similar correlation value. When applying the machine learning algorithms in test datasets, the model presented similar results for both periods, achieving the best accuracy of 86.4%, and Area Under ROC Curve (AUC) of 0.642. The difference in AUC values between the first and the second periods varied up to 0.05 (better in the first dataset) and in accuracy up to 4% (better in the second period). Overall results indicate that the most relevant risk factors were stable over the years, with some minor changes. Further research is required to incorporate other readmission risk factors, such as social determinants and mental health and well-being.

[1]  J. Paratz,et al.  Re-admission to intensive care: identification of risk factors. , 2005, Physiotherapy research international : the journal for researchers and clinicians in physical therapy.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  João Miguel da Costa Sousa,et al.  Data mining using clinical physiology at discharge to predict ICU readmissions , 2012, Expert Syst. Appl..

[4]  Om Prakash Vyas,et al.  A Feature Subset Selection Technique for High Dimensional Data Using Symmetric Uncertainty , 2014 .

[5]  Carl van Walraven,et al.  LACE+ index: extension of a validated index to predict early death or urgent readmission after hospital discharge using administrative data , 2012, Open medicine : a peer-reviewed, independent, open-access journal.

[6]  C. Weltens,et al.  Risk factors for unplanned hospital re-admissions: a secondary data analysis of hospital discharge summaries. , 2015, Journal of evaluation in clinical practice.

[7]  Amlan Chakrabarti,et al.  Feature Selection: A Practitioner View , 2014 .

[8]  C. Mackenzie,et al.  A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. , 1987, Journal of chronic diseases.

[9]  P. Austin,et al.  Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community , 2010, Canadian Medical Association Journal.

[10]  James D Slover,et al.  Cost burden of 30-day readmissions following Medicare total hip and knee arthroplasty. , 2014, The Journal of arthroplasty.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[13]  H. Bitterman,et al.  Predicting 30-Day Readmissions With Preadmission Electronic Health Record Data , 2015, Medical care.

[14]  Ali Al-Shahib,et al.  Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence , 2005, Applied bioinformatics.

[15]  Michael J. Rothman,et al.  Development and validation of a continuous measure of patient condition using the Electronic Medical Record , 2013, J. Biomed. Informatics.

[16]  Yashar Maali,et al.  Predicting 7-day, 30-day and 60-day all-cause unplanned readmission: a case study of a Sydney hospital , 2018, BMC Medical Informatics and Decision Making.

[17]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[18]  R. Robinson,et al.  The HOSPITAL score and LACE index as predictors of 30 day readmission in a retrospective study at a university-affiliated community hospital , 2017, PeerJ.

[19]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[20]  Karen Page,et al.  Intensive care readmission: a contemporary review of the literature. , 2014, Intensive & critical care nursing.

[21]  Gang Qu,et al.  An integrated machine learning framework for hospital readmission prediction , 2018, Knowl. Based Syst..

[22]  Eun Whan Lee Selecting the Best Prediction Model for Readmission , 2012, Journal of preventive medicine and public health = Yebang Uihakhoe chi.

[23]  J. Schnipper,et al.  Potentially avoidable 30-day hospital readmissions in medical patients: derivation and validation of a prediction model. , 2013, JAMA internal medicine.

[24]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[25]  Evan G. Wong,et al.  Association of severity of illness and intensive care unit readmission: A systematic review. , 2016, Heart & lung : the journal of critical care.

[26]  John Billings,et al.  Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (PARR-30) , 2012, BMJ Open.

[27]  M. Verhofstad,et al.  Patients' characteristics associated with readmission to a surgical intensive care unit. , 2012, American journal of critical care : an official publication, American Association of Critical-Care Nurses.