Improving Hospital Readmission Prediction Using Domain Knowledge Based Virtual Examples

In recent years, prediction of 30-day hospital readmission risk received increased interest in the area of Healthcare Predictive Analytics because of high human and financial impact. However, lack of data, high class and feature imbalance, and sparsity of the data make this task so challenging that most of the efforts to produce accurate data-driven readmission predictive models failed. We address these problems by proposing a novel method for generation of virtual examples that exploits synergetic effect of data driven models and domain knowledge by integrating qualitative knowledge and available data as complementary information sources. Domain knowledge, presented in the form of ICD-9 hierarchy of diagnoses, is used to characterize rare or unseen co-morbidities, which presumably have similar outcome according to ICD-9 hierarchy. We evaluate the proposed method on 66,994 pediatric hospital discharge records from California, State Inpatient Databases (SID), Healthcare Cost and Utilization Project (HCUP) in the period from 2009 to 2011, and show improved prediction of 30-day hospital readmission accuracy compared to state-of-the-art alternative methods. We attribute the improvement obtained by the proposed method to the fact that rare diseases have high percentage of readmission, and models based entirely on data usually fail to detect this qualitative information.

[1]  Scott Steele,et al.  Using Machine-Learned Bayesian Belief Networks to Predict Perioperative Risk of Clostridium Difficile Infection Following Colon Surgery , 2012, Interactive journal of medical research.

[2]  Ankur Agarwal,et al.  Predicting Hospital Readmission Risk for COPD Using EHR Information , 2013, Handbook of Medical and Healthcare Technologies.

[3]  Jianpei Zhang,et al.  A novel virtual sample generation method based on Gaussian distribution , 2011, Knowl. Based Syst..

[4]  Nitesh V. Chawla,et al.  Time to CARE: a collaborative engine for practical disease prediction , 2010, Data Mining and Knowledge Discovery.

[5]  Rajendu Srivastava,et al.  Pediatric readmissions as a hospital quality measure. , 2013, JAMA.

[6]  Anthony K. H. Tung,et al.  Contextual crowd intelligence , 2014, SKDD.

[7]  Zoran Obradovic,et al.  Predicting Sepsis Severity from Limited Temporal Observations , 2014, Discovery Science.

[8]  Harlan M. Krumholz,et al.  An Administrative Claims Measure Suitable for Profiling Hospital Performance Based on 30-Day All-Cause Readmission Rates Among Patients With Acute Myocardial Infarction , 2011, Circulation. Cardiovascular quality and outcomes.

[9]  Zoran Obradovic,et al.  Distributed Privacy Preserving Decision Support System for Predicting Hospitalization Risk in Hospitals with Insufficient Data , 2012, 2012 11th International Conference on Machine Learning and Applications.

[10]  Shahram Yazdani,et al.  Emergence of pediatric rare diseases , 2013, Rare diseases.

[11]  Girish N. Nadkarni,et al.  Leveraging hierarchy in medical codes for predictive modeling , 2014, BCB.

[12]  Mohamed F. Ghalwash,et al.  A Data-Driven Model for Optimizing Therapy Duration for Septic Patients , 2014 .

[13]  Zoran Obradovic,et al.  Distributed Privacy-Preserving Decision Support System for Highly Imbalanced Clinical Data , 2013, TMIS.

[14]  Zoran Obradovic,et al.  Hospital pricing estimation by Gaussian conditional random fields based regression on graphs , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[15]  Der-Chiang Li,et al.  A genetic algorithm-based virtual sample generation technique to improve small data set learning , 2014, Neurocomputing.

[16]  T. Poggio,et al.  Recognition and Structure from one 2D Model View: Observations on Prototypes, Object Classes and Symmetries , 1992 .

[17]  Dean F. Sittig,et al.  State of the Art in Clinical Informatics: Evidence and Examples , 2013, Yearbook of Medical Informatics.

[18]  Zoran Obradovic,et al.  A distributed decision support algorithm that preserves personal privacy , 2014, Journal of Intelligent Information Systems.

[19]  Tosha B. Wetterneck,et al.  Hospital Readmission in General Medicine Patients: A Prediction Model , 2009, Journal of General Internal Medicine.

[20]  Kai Yang,et al.  A predictive analytics approach to reducing 30-day avoidable readmissions among patients with heart failure, acute myocardial infarction, pneumonia, or COPD , 2015, Health care management science.

[21]  Matjaz Gams,et al.  Combining domain knowledge and machine learning for robust fall detection , 2014, Expert Syst. J. Knowl. Eng..

[22]  Giuseppe Legname,et al.  Novel Approaches to Diagnosis and Therapy in Neurodegenerative Diseases , 2015 .

[23]  Zoran Obradovic,et al.  Disease Prediction Based on Prior Knowledge , 2012 .

[24]  Fei Wang,et al.  Readmission Classification Using Stacked Regularized Logistic Regression Models , 2014, AMIA.