Novel Approach to Predict Hospital Readmissions Using Feature Selection from Unstructured Data with Class Imbalance

Abstract Feature selection for predictive analytics continues to be a major challenge in the healthcare industry, particularly as it relates to readmission prediction. Several research works in mining healthcare data have focused on structured data for readmission prediction. Even within those works that are based on unstructured data, significant gaps exist in addressing class imbalance, context specific noise removal which thus necessitates new approaches readmission prediction using unstructured data. In this work, a novel approach is proposed for feature selection and domain related stop words removal from unstructured with class imbalance in discharge summary notes. The proposed predictive model uses these features along with other relevant structured data. Five iterations of predictions were performed to tune and improve the models, results of which are presented and analyzed in this paper. The authors suggest future directions in implementing the proposed approach in hospitals or clinics aimed at leveraging structured and unstructured discharge summary notes.

[1]  Filip De Turck,et al.  Automated generation and deployment of clinical guidelines in the ICU , 2010, 2010 IEEE 23rd International Symposium on Computer-Based Medical Systems (CBMS).

[2]  Roger Hale Text mining: getting more value from literature resources. , 2005, Drug discovery today.

[3]  E. AbuKhousa,et al.  Predictive data mining to support clinical decisions: An overview of heart disease prediction systems , 2012, 2012 International Conference on Innovations in Information Technology (IIT).

[4]  Glenn Fung,et al.  Predicting Readmission Risk with Institution Specific Prediction Models , 2013, ICHI.

[5]  Vivek Tiwari,et al.  Pattern and Data Analysis in Healthcare Settings , 2016 .

[6]  Michael Wasikowski,et al.  Combating the Class Imbalance Problem in Small Sample Data Sets , 2009 .

[7]  João Miguel da Costa Sousa,et al.  Predicting ICU readmissions based on bedside medical text notes , 2016, 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[8]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[9]  Vadlamani Ravi,et al.  Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts , 2012, Expert Syst. Appl..

[10]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[11]  Abdelaziz Berrado,et al.  Towards a new framework for clustering in a mixed data space: Case of gasoline service stations segmentation in Morocco , 2015, 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA).

[12]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[13]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[14]  Joseph Futoma,et al.  A comparison of models for predicting early hospital readmissions , 2015, J. Biomed. Informatics.

[15]  Rong-Ho Lin,et al.  An intelligent model for liver disease diagnosis , 2009, Artif. Intell. Medicine.

[16]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[17]  Sanjeev Sood Leveraging Data Analytics in Healthcare: Some interesting Case Reports , 2012 .

[18]  Asit Kumar Das,et al.  Predictive Data Mining: Promising Future and Applications , 2010 .

[19]  Riccardo Bellazzi,et al.  Predictive data mining in clinical medicine: a focus on selected methods and applications , 2011, WIREs Data Mining Knowl. Discov..

[20]  Leora I. Horwitz,et al.  Association of Discharge Summary Quality With Readmission Risk for Patients Hospitalized With Heart Failure Exacerbation , 2015, Circulation. Cardiovascular quality and outcomes.

[21]  Tong Zhang,et al.  Data Sources for Prediction: Databases, Hybrid Data and the Web , 2015 .