Heart Disease Prediction Using Stacking Model With Balancing Techniques and Dimensionality Reduction

Heart disease is a serious worldwide health issue with wide-reaching effects. Since heart disease is one of the leading causes of mortality worldwide, early detection is crucial. Emerging technologies like Machine Learning (ML) are currently being actively used by the biomedical, healthcare, and health prediction industries. PaRSEL, a new stacking model is proposed in this research, that combines four classifiers, Passive Aggressive Classifier (PAC), Ridge Classifier (RC), Stochastic Gradient Descent Classifier (SGDC), and eXtreme Gradient Boosting (XGBoost), at the base layer, and LogitBoost is deployed for the final predictions at the meta layer. The imbalanced and irrelevant features in the data increase the complexity of the classification models. The dimensionality reduction and data balancing approaches are considered very important for lowering costs and increasing the accuracy of the model. In PaRSEL, three dimensionality reduction techniques, Recursive Feature Elimination (RFE), Linear Discriminant Analysis (LDA), and Factor Analysis (FA), are used to reduce the dimensionality and select the most relevant features for the diagnosis of heart disease. Furthermore, eight balancing techniques, Proximity Weighted Random Affine Shadowsampling (ProWRAS), Localized Randomized Affine Shadowsampling (LoRAS), Random Over Sampling (ROS), Adaptive Synthetic (ADASYN), Synthetic Minority Oversampling Technique (SMOTE), Borderline SMOTE (B-SMOTE), Majority Weighted Minority Oversampling Technique (MWMOTE) and Random Walk Oversampling (RWOS), are used to deal with the imbalanced nature of the dataset. The performance of PaRSEL is compared with the other standalone classifiers using different performance measures like accuracy, F1-score, precision, recall and AUC-ROC score. Our proposed model achieves 97% accuracy, 80% F1-score, precision is greater than 90%, 67% recall, and 98% AUC-ROC score. This shows that PaRSEL outperforms other standalone classifiers in terms of heart disease prediction. Additionally, we deploy SHapley Additive exPlanations (SHAP) on our proposed model. It helps to understand the internal working of the model. It illustrates how much influence a classifier has on the final prediction outcome.

[1]  S. Vellela,et al.  Coronary Heart Disease Prediction and Classification using Hybrid Machine Learning Algorithms , 2023, 2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA).

[2]  Rakibul Hasan,et al.  Heart Disease Detection Using ML , 2023, 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC).

[3]  P. Jain,et al.  Optimized levy flight model for heart disease prediction using CNN framework in big data application , 2023, Expert Syst. Appl..

[4]  Elias Dritsas,et al.  Long-Term Coronary Artery Disease Risk Prediction with Machine Learning Models , 2023, Sensors.

[5]  V. Chaurasia,et al.  Novel Method of Characterization of Heart Disease Prediction Using Sequential Feature Selection-Based Ensemble Technique , 2023, Biomedical Materials & Devices.

[6]  Shilpi Sharma,et al.  A comparative assessment of artificial intelligence models used for early prediction and evaluation of chronic kidney disease , 2023, Decision Analytics Journal.

[7]  Kshira Sagar Sahoo,et al.  A stacking classifiers model for detecting heart irregularities and predicting Cardiovascular Disease , 2022, Healthcare Analytics.

[8]  Carson K. Lam,et al.  Assessing the effects of data drift on the performance of machine learning models used in clinical sepsis prediction , 2022, Int. J. Medical Informatics.

[9]  Zhaohong Deng,et al.  EnsDeepDP: An Ensemble Deep Learning Approach for Disease Prediction Through Metagenomics , 2022, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  M. Hassan,et al.  Heart disease prediction based on pre-trained deep neural networks combined with principal component analysis , 2022, Biomed. Signal Process. Control..

[11]  Yuh-Jen Chen,et al.  Forecasting corporate credit ratings using big data from social media , 2022, Expert Syst. Appl..

[12]  Md. Musta-E-Nur Rahman,et al.  EEG-based emotion analysis using non-linear features and ensemble learning approaches , 2022, Expert Syst. Appl..

[13]  Fan Yang,et al.  An efficient model selection for linear discriminant function-based recursive feature elimination , 2022, J. Biomed. Informatics.

[14]  Olaf Wolkenhauer,et al.  A Multi-Schematic Classifier-Independent Oversampling Approach for Imbalanced Datasets , 2021, IEEE Access.

[15]  Daniel Fryer,et al.  Shapley values for feature selection: The good, the bad, and the axioms , 2021, IEEE Access.

[16]  Noora Shrestha,et al.  Factor Analysis as a Tool for Survey Analysis , 2021 .

[17]  B. K. Pattanayak,et al.  IDMS: An Integrated Decision Making System for Heart Disease Prediction , 2021, 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON).

[18]  Makram Soui,et al.  Bankruptcy Prediction Using Deep Learning Approach Based on Borderline SMOTE , 2020, Information Systems Frontiers.

[19]  Thar Baker,et al.  Analysis of Dimensionality Reduction Techniques on Big Data , 2020, IEEE Access.

[20]  G. Kumaravelan,et al.  Performance Evaluation of Deep Learning Algorithms in Biomedical Document Classification , 2019, 2019 11th International Conference on Advanced Computing (ICoAC).

[21]  Olaf Wolkenhauer,et al.  LoRAS: an oversampling approach for imbalanced datasets , 2019, Machine Learning.

[22]  D. Choi Data-Driven Materials Modeling with XGBoost Algorithm and Statistical Inference Analysis for Prediction of Fatigue Strength of Steels , 2019, International Journal of Precision Engineering and Manufacturing.

[23]  Adhistya Erna Permanasari,et al.  Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-KNN (ADASYN-KNN) for Multiclass Imbalance Learning on Laboratory Test Data , 2018, 2018 4th International Conference on Science and Technology (ICST).

[24]  Andrea Esuli,et al.  Distributional Random Oversampling for Imbalanced Text Classification , 2016, SIGIR.

[25]  Huaxiang Zhang,et al.  RWO-Sampling: A random walk over-sampling approach to imbalanced data classification , 2014, Inf. Fusion.

[26]  Quoc V. Le,et al.  Stochastic Gradient Descent , 2014, Machine Learning with Neural Networks.

[27]  K. Murase,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014, IEEE Transactions on Knowledge and Data Engineering.

[28]  N. Alrajeh,et al.  Efficient Deep Learning Models for Predicting Super-Utilizers in Smart Hospitals , 2023, IEEE Access.

[29]  Abdulaziz Aldegheishem,et al.  A New Framework for Fraud Detection in Bitcoin Transactions Through Ensemble Stacking Model in Smart Cities , 2023, IEEE Access.

[30]  Nadeem Javaid,et al.  Adaptive Data Balancing Method Using Stacking Ensemble Model and Its Application to Non-Technical Loss Detection in Smart Grids , 2022, IEEE Access.

[31]  Mohamed Alfateh Badawy,et al.  RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification , 2022, J. King Saud Univ. Comput. Inf. Sci..

[32]  Yihong Li,et al.  SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique , 2021, Knowl. Based Syst..

[33]  Hamoud Aljamaan,et al.  Impact of Hyperparameter Tuning on Machine Learning Models in Stock Price Forecasting , 2021, IEEE Access.

[34]  Wang Li,et al.  A Stacking-Based Model for Non-Invasive Detection of Coronary Heart Disease , 2020, IEEE Access.

[35]  C. Beulah Christalin Latha,et al.  Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques , 2019, Informatics in Medicine Unlocked.

[36]  Adeeb Noor,et al.  An Intelligent Learning System Based on Random Search Algorithm and Optimized Random Forest Model for Improved Heart Disease Detection , 2019, IEEE Access.