Empirical thresholding logistic regression model based on unbalanced cardiac patient data

Cardiac disease causes widespread morbidity and mortality. Past research in this area focused on risk factors and treatment. Little exists on patient survival classification in emergency room situations with unbalanced data. The current study expanded knowledge in this area based on over 2,000 cardiac patient records. This unbalanced dataset was used to develop an empirical, thresholding logistic regression model which predicted patients survival. The model was refined using stepwise and cost-efficient methods. The exploration revealed important factors that influenced patient survival and suggested a thresholding logistic regression model can provide a flexible and pragmatic way to handle unbalanced cardiac patient data. The model identified key factors to help doctors concentrate on important indicators related to patient survival. This study offers novel technical and practical insights for instant survival analysis of cardiac patients, using an unbalanced dataset.

[1]  Hassan Dao,et al.  Segmentation and detection of media adventitia coronary artery boundary in medical imaging intravascular ultrasound using otsu thresholding , 2015, 2015 International Conference on BioSignal Analysis, Processing and Systems (ICBAPS).

[2]  Xin Zhang,et al.  Multi-Classification Combination Algorithm Based on Logit Model and Support Vector Machine , 2013 .

[3]  Ali Idri,et al.  Knowledge discovery in cardiology: A systematic literature review , 2017, Int. J. Medical Informatics.

[4]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[5]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[6]  Keikichi Hirose,et al.  Voiced/non-voiced speech classification using adaptive thresholding with bivariate EMD , 2016, Pattern Analysis and Applications.

[7]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[8]  S L Hancock,et al.  Factors affecting late mortality from heart disease after treatment of Hodgkin's disease. , 1993, JAMA.

[9]  Erik Brynjolfsson,et al.  Goodbye Pareto Principle, Hello Long Tail: The Effect of Search Costs on the Concentration of Product Sales , 2011, Manag. Sci..

[10]  Bernhard Seeger,et al.  A Cost-Based Approach to Adaptive Resource Management in Data Stream Systems , 2008, IEEE Transactions on Knowledge and Data Engineering.

[11]  Kun-Huang Chen,et al.  A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients , 2014, Appl. Soft Comput..

[12]  Nina Zumel,et al.  Practical Data Science with R , 2014 .

[13]  Tole Sutikno,et al.  Comparing Performance of Data Mining Algorithms in Prediction Heart Diseases , 2015 .

[14]  M. Durairaj,et al.  An empirical study on applying data mining techniques for the analysis and prediction of heart disease , 2013, 2013 International Conference on Information Communication and Embedded Systems (ICICES).

[15]  P. Wolf,et al.  Heart disease and stroke statistics--2006 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. , 2006, Circulation.

[16]  David A. Cieslak,et al.  Learning Decision Trees for Unbalanced Data , 2008, ECML/PKDD.

[17]  Vili Podgorelec,et al.  Finding the right decision tree's induction strategy for a hard real world problem , 2001, Int. J. Medical Informatics.

[18]  Didier Devaurs,et al.  Efficient Sampling-Based Approaches to Optimal Path Planning in Complex Cost Spaces , 2014, WAFR.

[19]  Charles Elkan,et al.  Optimal Thresholding of Classifiers to Maximize F1 Measure , 2014, ECML/PKDD.