An improved support vector machine-based diabetic readmission prediction

BACKGROUND AND OBJECTIVE In healthcare systems, the cost of unplanned readmission accounts for a large proportion of total hospital payment. Hospital-specific readmission rate becomes a critical issue around the world. Quantification and early identification of unplanned readmission risks will improve the quality of care during hospitalization and reduce the occurrence of readmission. In clinical practice, medical workers generally use LACE score method to evaluate patient readmission risks, but this method usually performs poorly. With this in mind, this study presents a novel method combining support vector machine and genetic algorithm to build the risk prediction model, which simultaneously involves feature selection and the processing of imbalanced data. This model aims to provide decision support for clinicians during the discharge management of patients with diabetes. METHOD The experiments were conducted from a set of 8756 medical records with 50 different features about diabetic readmission. After preprocessing the data, an effective SMOTE-based method was proposed to solve the imbalance data problem. Further, in order to improve prediction performance, a hybrid feature selection mechanism was devised to select the important features. Subsequently, an improved support vector machine-based (SVM-based) method was developed and the genetic algorithm was used to tune the sensitive parameter of the algorithm. Finally, the five-fold cross-validation method was applied to compare the performance of proposed method with other methods (LACE score, logistic regression, naïve bayes, decision tree and feed forward neural networks). RESULTS Experimental results indicate that the proposed SVM-based method achieves an accuracy of 81.02%, a sensitivity of 82.89%, a specificity of 79.23%, and outperforms other popular algorithms in identifying diabetic patients who may be readmitted. CONCLUSIONS Our research can improve the performance of clinic decision support systems for diabetic readmission, by which the readmission possibility as well as the waste of medical resources can be reduced.

[1]  Pedro Abreu,et al.  Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[2]  Jerry Kaplan,et al.  Molecular and clinical correlates in iron overload associated with mutations in ferroportin. , 2006, Haematologica.

[3]  Jerrold H. May,et al.  A mixed-ensemble model for hospital readmission , 2016, Artif. Intell. Medicine.

[4]  C. Subbe,et al.  Readmissions of medical patients: an external validation of two existing prediction scores. , 2016, QJM : monthly journal of the Association of Physicians.

[5]  João Miguel da Costa Sousa,et al.  Data mining using clinical physiology at discharge to predict ICU readmissions , 2012, Expert Syst. Appl..

[6]  Leigh Blizzard,et al.  Mild cognitive impairment predicts death and readmission within 30days of discharge for heart failure. , 2016, International journal of cardiology.

[7]  Hadi Kharrazi,et al.  Feasibility of 30-day hospital readmission prediction modeling based on health information exchange data , 2015, Int. J. Medical Informatics.

[8]  Daniel J. Rubin Hospital Readmission of Patients with Diabetes , 2015, Current Diabetes Reports.

[9]  Adrian F Hernandez,et al.  Association of 30-Day Readmission Metric for Heart Failure Under the Hospital Readmissions Reduction Program With Quality of Care and Outcomes. , 2016, JACC. Heart failure.

[10]  Fabio Roli,et al.  Designing multi-label classifiers that maximize F measures: State of the art , 2017, Pattern Recognit..

[11]  Dayou Liu,et al.  Evolving support vector machines using fruit fly optimization for medical data classification , 2016, Knowl. Based Syst..

[12]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[13]  Mikel Galar,et al.  Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy , 2016, Appl. Soft Comput..

[14]  Rok Blagus,et al.  Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models , 2015, BMC Bioinformatics.

[15]  Javed Butler,et al.  Six‐minute walk distance predicts 30‐day readmission after acute heart failure hospitalization , 2017, Heart & lung : the journal of critical care.

[16]  T. Le,et al.  Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model , 2018 .

[17]  Amir Hassan Zadeh,et al.  Predicting overall survivability in comorbidity of cancers: A data mining approach , 2015, Decis. Support Syst..

[18]  Vladimir Brusic,et al.  An adaptive genetic algorithm for selection of blood-based biomarkers for prediction of Alzheimer's disease progression , 2015, BMC Bioinformatics.

[19]  Sumio Yamada,et al.  Functional limitations predict the risk of rehospitalization among patients with chronic heart failure. , 2012, Circulation journal : official journal of the Japanese Circulation Society.

[20]  Glenn Fung,et al.  Predicting Readmission Risk with Institution Specific Prediction Models , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[21]  Qing Xie,et al.  An improved early detection method of type-2 diabetes mellitus using multiple classifier system , 2015, Inf. Sci..

[22]  K J Ottenbacher,et al.  Comparison of logistic regression and neural networks to predict rehospitalization in patients with stroke. , 2001, Journal of clinical epidemiology.

[23]  P. Austin,et al.  Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community , 2010, Canadian Medical Association Journal.

[24]  Grant S. Fletcher,et al.  International Validity of the HOSPITAL Score to Predict 30-Day Potentially Avoidable Hospital Readmissions. , 2016, JAMA internal medicine.

[25]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[26]  José Sergio Ruiz Castilla,et al.  PSO-based method for SVM classification on skewed data sets , 2017, Neurocomputing.

[27]  Tobias Reichlin,et al.  Central venous pressure at emergency room presentation predicts cardiac rehospitalization in patients with decompensated heart failure , 2010, European journal of heart failure.

[28]  Sang Won Yoon,et al.  Predictive modeling of hospital readmissions using metaheuristics and data mining , 2015, Expert Syst. Appl..

[29]  George Hripcsak,et al.  The effects of data sources, cohort selection, and outcome definition on a predictive model of risk of thirty-day hospital readmissions , 2014, J. Biomed. Informatics.

[30]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Shen Yin,et al.  Tuning kernel parameters for SVM based on expected square distance ratio , 2016, Inf. Sci..

[32]  Aziz Guergachi,et al.  Predicting Breast Screening Attendance Using Machine Learning Techniques , 2011, IEEE Transactions on Information Technology in Biomedicine.

[33]  V. Rodriguez-Galiano,et al.  Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. , 2018, The Science of the total environment.

[34]  Aleksandra Werner,et al.  The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis , 2017, Inf. Sci..

[35]  Roohallah Alizadehsani,et al.  Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm , 2017, Comput. Methods Programs Biomed..

[36]  Sunil Kumar Khatri,et al.  Impact of selected pre-processing techniques on prediction of risk of early readmission for diabetic patients in India , 2016, International Journal of Diabetes in Developing Countries.

[37]  Sherita Hill Golden,et al.  Early readmission among patients with diabetes: a qualitative assessment of contributing factors. , 2014, Journal of diabetes and its complications.

[38]  Nan Liu,et al.  Predicting 30-Day Readmissions: Performance of the LACE Index Compared with a Regression Model among General Medicine Patients in Singapore , 2015, BioMed research international.

[39]  Boris Delibasic,et al.  Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression , 2016, Artif. Intell. Medicine.

[40]  Tai-Hsi Wu,et al.  Using data mining techniques to predict hospitalization of hemodialysis patients , 2011, Decis. Support Syst..

[41]  J. Shaw,et al.  Global estimates of the prevalence of diabetes for 2010 and 2030. , 2010, Diabetes research and clinical practice.

[42]  R. E. Hodgson,et al.  Prediction of readmission to acute psychiatric units , 2001, Social Psychiatry and Psychiatric Epidemiology.

[43]  M. Hendryx,et al.  Predicting rehospitalization and outpatient services from administration and clinical databases , 2003, The Journal of Behavioral Health Services & Research.

[44]  Cheng Li,et al.  Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records , 2016, Knowl. Based Syst..

[45]  David Roe,et al.  Routine patient reported outcomes as predictors of psychiatric rehospitalization , 2017, Schizophrenia Research.

[46]  Rema Padman,et al.  Analyzing Hospital Readmissions Using Creatinine Results for Patients with Many Visits , 2016, EUSPN/ICTH.

[47]  V. K. Bhalla,et al.  Predicting readmissions: poor performance of the LACE index in an older UK population. , 2012, Age and ageing.

[48]  Tadayoshi Fushiki,et al.  Estimation of prediction error by using K-fold cross-validation , 2011, Stat. Comput..

[49]  S. Lee,et al.  Utility of the LACE index at the bedside in predicting 30-day readmission or death in patients hospitalized with heart failure. , 2016, American heart journal.

[50]  E. Nesmith,et al.  Reduction of 30-Day Preventable Pediatric Readmission Rates With Postdischarge Phone Calls Utilizing a Patient- and Family-Centered Care Approach. , 2015, Journal of pediatric health care : official publication of National Association of Pediatric Nurse Associates & Practitioners.

[51]  A. Jha,et al.  Thirty-day readmissions--truth and consequences. , 2012, The New England journal of medicine.

[52]  Han-Cheng Wang,et al.  Developing a data mining approach to investigate association between physician prescription and patient outcome - A study on re-hospitalization in Stevens-Johnson Syndrome , 2013, Comput. Methods Programs Biomed..

[53]  Bart Baesens,et al.  An empirical comparison of techniques for the class imbalance problem in churn prediction , 2017, Inf. Sci..

[54]  Venkatesh Saligrama,et al.  Prediction of hospitalization due to heart diseases by supervised learning methods , 2015, Int. J. Medical Informatics.

[55]  Li Li,et al.  Adjusted weight voting algorithm for random forests in handling missing values , 2017, Pattern Recognit..

[56]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.