Predictive Modeling of Hospital Readmission Rates Using Electronic Medical Record-Wide Machine Learning: A Case-Study Using Mount Sinai Heart Failure Cohort

Reduction of preventable hospital readmissions that result from chronic or acute conditions like stroke, heart failure, myocardial infarction and pneumonia remains a significant challenge for improving the outcomes and decreasing the cost of healthcare delivery in the United States. Patient readmission rates are relatively high for conditions like heart failure (HF) despite the implementation of high-quality healthcare delivery operation guidelines created by regulatory authorities. Multiple predictive models are currently available to evaluate potential 30-day readmission rates of patients. Most of these models are hypothesis driven and repetitively assess the predictive abilities of the same set of biomarkers as predictive features. In this manuscript, we discuss our attempt to develop a data-driven, electronic-medical record-wide (EMR-wide) feature selection approach and subsequent machine learning to predict readmission probabilities. We have assessed a large repertoire of variables from electronic medical records of heart failure patients in a single center. The cohort included 1,068 patients with 178 patients were readmitted within a 30-day interval (16.66% readmission rate). A total of 4,205 variables were extracted from EMR including diagnosis codes (n=1,763), medications (n=1,028), laboratory measurements (n=846), surgical procedures (n=564) and vital signs (n=4). We designed a multistep modeling strategy using the Naïve Bayes algorithm. In the first step, we created individual models to classify the cases (readmitted) and controls (non-readmitted). In the second step, features contributing to predictive risk from independent models were combined into a composite model using a correlation-based feature selection (CFS) method. All models were trained and tested using a 5-fold cross-validation method, with 70% of the cohort used for training and the remaining 30% for testing. Compared to existing predictive models for HF readmission rates (AUCs in the range of 0.6-0.7), results from our EMR-wide predictive model (AUC=0.78; Accuracy=83.19%) and phenome-wide feature selection strategies are encouraging and reveal the utility of such datadriven machine learning. Fine tuning of the model, replication using multi-center cohorts and prospective clinical trial to evaluate the clinical utility would help the adoption of the model as a clinical decision system for evaluating readmission status.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Ralf Zimmer,et al.  BioWeka - extending the Weka framework for bioinformatics , 2007, Bioinform..

[3]  Li Li,et al.  Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks , 2016, Bioinform..

[4]  David Karasik,et al.  How pleiotropic genetics of the musculoskeletal system can inform genomics and phenomics of aging , 2011, AGE.

[5]  Luann G Richardson Psychosocial issues in patients with congestive heart failure. , 2003, Progress in cardiovascular nursing.

[6]  Joseph Futoma,et al.  A comparison of models for predicting early hospital readmissions , 2015, J. Biomed. Informatics.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  S. Omholt,et al.  Phenomics: the next challenge , 2010, Nature Reviews Genetics.

[9]  Jennifer Y. Liu,et al.  Glycemic Control and Heart Failure Among Adult Patients With Diabetes , 2001, Circulation.

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Christopher G. Chute,et al.  A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects , 2013, Human Genetics.

[12]  Eyke Hüllermeier,et al.  A WEKA Interface for fMRI Data , 2012, Neuroinformatics.

[13]  M. Zile,et al.  Randomized controlled trial of an implantable continuous hemodynamic monitor in patients with advanced heart failure: the COMPASS-HF study. , 2008, Journal of the American College of Cardiology.

[14]  Thomas Hildebrandt,et al.  Psychological and social factors that correlate with dyspnea in heart failure. , 2006, Psychosomatics.

[15]  Andrew Boyle,et al.  Development of a method to risk stratify patients with heart failure for 30-day readmission using implantable device diagnostics. , 2013, The American journal of cardiology.

[16]  Gregory Y H Lip,et al.  Psychological factors in heart failure: a review of the literature. , 2002, Archives of internal medicine.

[17]  Farzaneh Maghaminejad,et al.  The role of continuous care in reducing readmission for patients with heart failure. , 2013, Journal of caring sciences.

[18]  H. White,et al.  Carvedilol: use in chronic heart failure , 2007, Expert review of cardiovascular therapy.

[19]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[20]  Amanda H. Salanitro,et al.  Risk prediction models for hospital readmission: a systematic review. , 2011, JAMA.

[21]  Li Li,et al.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records , 2016, Scientific Reports.

[22]  Joel T. Dudley,et al.  Data-Driven Identification of Risk Factors of Patient Satisfaction at a Large Urban Academic Medical Center , 2016, PloS one.

[23]  Riccardo Miotto,et al.  Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams , 2016, Briefings Bioinform..

[24]  Joachim Roski,et al.  Creating value in health care through big data: opportunities and policy implications. , 2014, Health affairs.

[25]  Eibe Frank,et al.  Introducing Machine Learning Concepts with WEKA , 2016, Statistical Genomics.

[26]  Paaladinesh Thavendiranathan,et al.  Prediction of 30-day heart failure-specific readmission risk by echocardiographic parameters. , 2014, The American journal of cardiology.

[27]  Chandan K. Reddy,et al.  Joint Impact of Clinical and Behavioral Variables on the Risk of Unplanned Readmission and Death after a Heart Failure Hospitalization , 2015, PloS one.

[28]  Li Li,et al.  An Integrative Pipeline for Multi-Modal Discovery of Disease Relationships , 2014, Pacific Symposium on Biocomputing.

[29]  Sunil Kumar Khatri,et al.  Predictive risk modelling for early hospital readmission of patients with diabetes in India , 2016, International Journal of Diabetes in Developing Countries.

[30]  Marcus A. Badgeley,et al.  EHDViz: clinical dashboard development using open-source technologies , 2016, BMJ Open.

[31]  R. Schweitzer,et al.  Psychological Factors and Treatment Adherence Behavior in Patients With Chronic Heart Failure , 2007, The Journal of cardiovascular nursing.

[32]  J. Dudley,et al.  Cognitive Machine-Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy. , 2016, Circulation. Cardiovascular imaging.

[33]  Shelby Inouye,et al.  Predicting readmission of heart failure patients using automated follow-up calls , 2015, BMC Medical Informatics and Decision Making.

[34]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[35]  Benjamin A. Goldstein,et al.  Using “Big Data” to Capture Overall Health Status: Properties and Predictive Value of a Claims-Based Health Risk Score , 2015, PloS one.