Survival Prediction of Heart Failure Patients using Stacked Ensemble Machine Learning Algorithm

cardiovascular disease, especially heart failure is one of the major health hazard issues of our time and is a leading cause of death worldwide. Advancement in data mining techniques using machine learning (ML) models is paving promising prediction approaches. Data mining is the process of converting massive volumes of raw data created by the healthcare institutions into meaningful information that can aid in making predictions and crucial decisions. Collecting various follow-up data from patients who have had heart failures, analyzing those data, and utilizing several ML models to predict the survival possibility of cardiovascular patients is the key aim of this study. Due to the imbalance of the classes in the dataset, Synthetic Minority Oversampling Technique (SMOTE) has been implemented. Two unsupervised models (K-Means and Fuzzy C-Means clustering) and three supervised classifiers (Random Forest, XGBoost and Decision Tree) have been used in our study. After thorough investigation, our results demonstrate a superior performance of the supervised ML algorithms over unsupervised models. Moreover, we designed and propose a supervised stacked ensemble learning model that can achieve an accuracy, precision, recall and F1 score of 99.98%. Our study shows that only certain attributes collected from the patients are imperative to successfully predict the surviving possibility post heart failure, using supervised ML algorithms. Keywords—Cardiovascular disease, Heart failure, Ensemble Machine learning, Clustering, Random Forest, XGBoost, Decision Tree.

[1]  Gilles Louppe,et al.  Independent consultant , 2013 .

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  G. Lippi,et al.  Global epidemiology and future trends of heart failure , 2020 .

[4]  Jaymin M. Patel,et al.  Heart Disease Prediction Using Machine learning and Data Mining Technique , 2016 .

[5]  Balwant A. Sonkamble,et al.  Overview of use of decision tree algorithms in machine learning , 2011, 2011 IEEE Control and System Graduate Research Colloquium.

[6]  Md. Mohsin Sarker Raihan,et al.  Multi-Class Electrogastrogram (EGG) Signal Classification Using Machine Learning Algorithms , 2020, 2020 23rd International Conference on Computer and Information Technology (ICCIT).

[7]  Mohamed El Halaby,et al.  The Application of Unsupervised Clustering Methods to Alzheimer’s Disease , 2019, Front. Comput. Neurosci..

[8]  Weiwei Lin,et al.  An Ensemble Random Forest Algorithm for Insurance Big Data Analysis , 2017, IEEE Access.

[9]  S. Ullah,et al.  Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques , 2021, IEEE Access.

[10]  Omar Bonerge Pineda Lezama,et al.  Diabetes Diagnostic Prediction Using Vector Support Machines , 2020, ANT/EDI40.

[11]  B. Massie,et al.  Beware the rising creatinine level. , 2003, Journal of cardiac failure.

[12]  Zhou Zhubo,et al.  A Random Forest Classification Model for Transmission Line Image Processing , 2020, 2020 15th International Conference on Computer Science & Education (ICCSE).

[13]  Eibe Frank,et al.  Accelerating the XGBoost algorithm using GPU computing , 2017, PeerJ Comput. Sci..

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  V. Roger Epidemiology of Heart Failure , 2013, Circulation research.

[16]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[17]  M. Raza,et al.  Survival analysis of heart failure patients: A case study , 2017, PloS one.

[18]  Giuseppe Jurman,et al.  Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone , 2020, BMC Medical Informatics and Decision Making.

[19]  Abdullah Bin Shams,et al.  Development of Risk-Free COVID-19 Screening Algorithm from Routine Blood Test using Ensemble Machine Learning , 2021, ArXiv.

[20]  Qunying Liu,et al.  XGBoost-Based Algorithm Interpretation and Application on Post-Fault Transient Stability Status Prediction of Power System , 2019, IEEE Access.

[21]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[22]  Jitendra Kumar Jaiswal,et al.  Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression , 2017, 2017 World Congress on Computing and Communication Technologies (WCCCT).

[23]  Ç. Erdaş,et al.  A Machine Learning-Based Approach to Detect Survival of Heart Failure Patients , 2020, 2020 Medical Technologies Congress (TIPTEKNO).

[24]  Fahd Saleh Alotaibi,et al.  Implementation of Machine Learning Model to Predict Heart Failure Disease , 2019, International Journal of Advanced Computer Science and Applications.

[25]  Y. Tefera,et al.  The prognosis of heart failure patients: Does sodium level play a significant role? , 2018, PloS one.

[26]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[27]  Glenn Fung,et al.  A Comprehensive Overview of Basic Clustering Algorithms , 2001 .

[28]  M. Mostafizur Rahman,et al.  Addressing the Class Imbalance Problem in Medical Datasets , 2013 .

[29]  Bahzad Charbuty,et al.  Classification Based on Decision Tree Algorithm for Machine Learning , 2021, Journal of Applied Science and Technology Trends.