Predicting Hospital Readmission: A Joint Ensemble-Learning Model

Hospital readmission is among the most critical issues in the healthcare system due to its high prevalence and cost. The improvement effort necessitates reliable prediction models which can identify high-risk patients effectively and enable healthcare practitioners to take a strategic approach. Using predictive analytics based on electronic health record (EHR) for hospital readmission is faced with multiple challenges such as high dimensionality and event sparsity of medical codes and the class imbalance. To response to these challenges, an analytical framework is proposed by data-driven approaches using hospital inpatient administrative data from a nationwide healthcare dataset. A joint ensemble-learning model, which combines the modified weight boosting algorithm with stacking algorithm, is developed and validated. Our study first explores the effects of different feature engineering methods, which effectively handles the challenge of medical vector representation and medical vector sparsity. Secondly, ensemble learning with the proposed modified weight boosting algorithm is used to tackle the class imbalance problem and improve predictability. Finally, we provide various misclassification costs by setting different weights for each class during model training. Using the framework with the proposed modified weight boosting algorithm improves overall model performance by 22.7% and recall from 0.726 to the highest of 0.891 comparing to the benchmark models. Hospital practitioners can also utilize the prediction results of different cost weight to select the most suitable readmission intervention for patients according to the penalty policy of Centers for Medicare and Medicaid Services (CMS) and the cost trade-off of their hospitals.

[1]  Mark V. Williams,et al.  Interventions to Reduce 30-Day Rehospitalization: A Systematic Review , 2011, Annals of Internal Medicine.

[2]  Yongdai Kim,et al.  Gradient LASSO for feature selection , 2004, ICML.

[3]  Kurt M. Bretthauer,et al.  Reducing Hospital Readmissions by Integrating Empirical Prediction with Resource Optimization , 2016 .

[4]  E. Rackow Rehospitalizations among patients in the Medicare fee-for-service program. , 2009, The New England journal of medicine.

[5]  Medicare and Medicaid statistical supplement. , 1994, Health care financing review. Statistical supplement.

[6]  Peter C Austin,et al.  A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality , 2007, Statistics in medicine.

[7]  Davood Golmohammadi,et al.  Prediction modeling and pattern recognition for patient readmission , 2016 .

[8]  Fei Wang,et al.  TaGiTeD: Predictive Task Guided Tensor Decomposition for Representation Learning from Electronic Health Records , 2017, AAAI.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  Jingshan Li,et al.  An Analytical Framework for TJR Readmission Prediction and Cost-Effective Intervention , 2019, IEEE Journal of Biomedical and Health Informatics.

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Joseph Futoma,et al.  A comparison of models for predicting early hospital readmissions , 2015, J. Biomed. Informatics.

[14]  M. Ong,et al.  Performance of the LACE index to identify elderly patients at high risk for hospital readmission in Singapore , 2017, Medicine.

[15]  Eren Demir,et al.  A Decision Support Tool for Predicting Patients at Risk of Readmission: A Comparison of Classification Trees, Logistic Regression, Generalized Additive Models, and Multivariate Adaptive Regression Splines , 2014, Decis. Sci..

[16]  Yu Zhang,et al.  EEG classification using sparse Bayesian extreme learning machine for brain–computer interface , 2018, Neural Computing and Applications.

[17]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[18]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[19]  Weiming Dong,et al.  A study on cost behaviors of binary classification measures in class-imbalanced problems , 2014, ArXiv.

[20]  Kai Yang,et al.  A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD , 2015, Health care management science.

[21]  Sung-joon Min,et al.  Posthospital care transitions: patterns, complications, and risk identification. , 2004, Health services research.

[22]  Xuezhong Zhou,et al.  Risk factors associated with 31-day unplanned readmission in 50,912 discharged patients after stroke in China , 2018, BMC Neurology.

[23]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[24]  W G Henderson,et al.  Predicting non-elective hospital readmissions: a multi-site study. Department of Veterans Affairs Cooperative Study Group on Primary Care and Readmissions. , 2000, Journal of clinical epidemiology.

[25]  Richard Goldstein,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival and Repeated Measures Models , 2006, Technometrics.

[26]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[27]  Tian-Yu Liu,et al.  EasyEnsemble and Feature Selection for Imbalance Data Sets , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[28]  Glenn Fung,et al.  Predicting Readmission Risk with Institution Specific Prediction Models , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[29]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[30]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[31]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[32]  Xingyu Wang,et al.  Sparse Bayesian Classification of EEG for Brain–Computer Interface , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Amanda H. Salanitro,et al.  Risk prediction models for hospital readmission: a systematic review. , 2011, JAMA.

[34]  Sang Won Yoon,et al.  Predictive modeling of hospital readmissions using metaheuristics and data mining , 2015, Expert Syst. Appl..

[35]  Jerrold H. May,et al.  A mixed-ensemble model for hospital readmission , 2016, Artif. Intell. Medicine.

[36]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[37]  Haishuai Wang,et al.  Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital , 2017 .

[38]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[39]  Ankur Agarwal,et al.  A Natural Language Processing Framework for Assessing Hospital Readmissions for Patients With COPD , 2018, IEEE Journal of Biomedical and Health Informatics.

[40]  Mehdi Jamei,et al.  Predicting all-cause risk of 30-day hospital readmission using artificial neural networks , 2017, PloS one.

[41]  N. Kong,et al.  Claims data-driven modeling of hospital time-to-readmission risk with latent heterogeneity , 2019, Health care management science.