Random forest swarm optimization-based for heart diseases diagnosis

Heart disease has been one of the leading causes of death worldwide in recent years. Among diagnostic methods for heart disease, angiography is one of the most common methods, but it is costly and has side effects. Given the difficulty of heart disease prediction, data mining can play an important role in predicting heart disease accurately. In this paper, by combining the multi-objective particle swarm optimization (MOPSO) and Random Forest, a new approach is proposed to predict heart disease. The main goal is to produce diverse and accurate decision trees and determine the (near) optimal number of them simultaneously. In this method, an evolutionary multi-objective approach is used instead of employing a commonly used approach, i.e., bootstrap, feature selection in the Random Forest, and random number selection of training sets. By doing so, different training sets with different samples and features for training each tree are generated. Also, the obtained solutions in Pareto-optimal fronts determine the required number of training sets to build the random forest. By doing so, the random forest's performance can be enhanced, and consequently, the prediction accuracy will be improved. The proposed method's effectiveness is investigated by comparing its performance over six heart datasets with individual and ensemble classifiers. The results suggest that the proposed method with the (near) optimal number of classifiers outperforms the random forest algorithm with different classifiers.

[1]  Shahrokh Asadi,et al.  EMDID: Evolutionary multi-objective discretization for imbalanced datasets , 2018, Inf. Sci..

[2]  Roohallah Alizadehsani,et al.  Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm , 2017, Comput. Methods Programs Biomed..

[3]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[4]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[5]  Saeid Nahavandi,et al.  Classification of healthcare data using genetic fuzzy logic system and wavelets , 2015, Expert Syst. Appl..

[6]  Shahrokh Asadi,et al.  MEMOD: a novel multivariate evolutionary multi-objective discretization , 2017, Soft Computing.

[7]  Anne M. P. Canuto,et al.  Integrating complementary techniques for promoting diversity in classifier ensembles: A systematic study , 2014, Neurocomputing.

[8]  Xu-Cheng Yin,et al.  Diversity-Based Random Forests with Sample Weight Learning , 2019, Cognitive Computation.

[9]  Yang Wang,et al.  Multimodal Data Analysis of Alzheimer's Disease Based on Clustering Evolutionary Random Forest , 2020, IEEE Journal of Biomedical and Health Informatics.

[10]  H. Finner On a Monotonicity Problem in Step-Down Multiple Test Procedures , 1993 .

[11]  Shahrokh Asadi,et al.  Evolutionary fuzzification of RIPPER for regression: Case study of stock prediction , 2019, Neurocomputing.

[12]  Chongchong Qi,et al.  Evolutionary Random Forest Algorithms for Predicting the Maximum Failure Depth of Open Stope Hangingwalls , 2018, IEEE Access.

[13]  Shahrokh Asadi,et al.  An evolutionary deep belief network extreme learning-based for breast cancer diagnosis , 2019, Soft Comput..

[14]  Shahaboddin Shamshirband,et al.  A novel evolutionary-negative correlated mixture of experts model in tourism demand estimation , 2016, Comput. Hum. Behav..

[15]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[16]  Mohammad Karim Sohrabi,et al.  Multi-objective feature selection for warfarin dose prediction , 2017, Comput. Biol. Chem..

[17]  Md Zahidul Islam,et al.  Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm , 2016, Knowl. Based Syst..

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Shahrokh Asadi,et al.  A bi-objective optimization method to produce a near-optimal number of classifiers and increase diversity in Bagging , 2021, Knowl. Based Syst..

[20]  Shahrokh Asadi,et al.  Improvement of Bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization , 2020, Eng. Appl. Artif. Intell..

[21]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[22]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[23]  Oluwarotimi Williams Samuel,et al.  An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction , 2017, Expert Syst. Appl..

[24]  Mykola Pechenizkiy,et al.  Diversity in search strategies for ensemble feature selection , 2005, Inf. Fusion.

[25]  Zahra Donyavi,et al.  Using decomposition-based multi-objective evolutionary algorithm as synthetic example optimization for self-labeling , 2020, Swarm Evol. Comput..

[26]  Jamal Shahrabi,et al.  Complexity-based parallel rule induction for multiclass classification , 2017, Inf. Sci..

[27]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[28]  Dilip Singh Sisodia,et al.  Prediction of Diabetes using Classification Algorithms , 2018 .

[29]  Manjit Kaur,et al.  Multi-objective differential evolution based random forest for e-health applications , 2019, Modern Physics Letters B.

[30]  Jianjun Li A two-step rejection procedure for testing multiple hypotheses , 2008 .

[31]  Peyman Abbaszadeh,et al.  Development of a coupled wavelet transform and evolutionary Levenberg‐Marquardt neural networks for hydrological process modeling , 2018, Comput. Intell..

[32]  A. Bey,et al.  Prevalence and Clinical Aspects of Drug-induced Gingival Enlargement , 2009 .

[33]  Gunasekaran Manogaran,et al.  Hybrid Recommendation System for Heart Disease Diagnosis based on Multiple Kernel Learning with Adaptive Neuro-Fuzzy Inference System , 2017, Multimedia Tools and Applications.

[34]  Hua Li,et al.  Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study , 2009, Comput. Methods Programs Biomed..

[35]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[36]  Babak Mohammadzadeh Asl,et al.  Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM , 2017, Comput. Methods Programs Biomed..

[37]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[38]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[39]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[40]  Zsolt Cselényi,et al.  Mapping the dimensionality, density and topology of data: The growing adaptive neural gas , 2005, Comput. Methods Programs Biomed..

[41]  Nathan D. Wong,et al.  Epidemiological studies of CHD and the evolution of preventive cardiology , 2014, Nature Reviews Cardiology.

[42]  Abdulkadir Sengür,et al.  Effective diagnosis of heart disease through neural networks ensembles , 2009, Expert Syst. Appl..

[43]  Zahra Donyavi,et al.  Diverse training dataset generation based on a multi-objective optimization for semi-Supervised classification , 2020, Pattern Recognit..

[44]  Shahrokh Asadi,et al.  Development of a Reinforcement Learning-based Evolutionary Fuzzy Rule-Based System for diabetes diagnosis , 2017, Comput. Biol. Medicine.

[45]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[46]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[47]  D. Rom A sequentially rejective test procedure based on a modified Bonferroni inequality , 1990 .

[48]  Ponnuthurai N. Suganthan,et al.  Ensemble Classification and Regression-Recent Developments, Applications and Future Directions [Review Article] , 2016, IEEE Computational Intelligence Magazine.

[49]  Mohamed Haouari,et al.  Review of optimization techniques applied for the integration of distributed generation from renewable energy sources , 2017 .

[50]  M. S. Khalid,et al.  Ensemble approach for developing a smart heart disease prediction system using classification algorithms , 2018, Research Reports in Clinical Cardiology.

[51]  Qian Wang,et al.  A Hybrid Classification System for Heart Disease Diagnosis Based on the RFRS Method , 2017, Comput. Math. Methods Medicine.

[52]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[53]  S. Pal,et al.  Prediction of benign and malignant breast cancer using data mining techniques , 2018 .

[54]  Serafín Moral,et al.  Increasing diversity in random forest learning algorithm via imprecise probabilities , 2018, Expert Syst. Appl..

[55]  Usman Qamar,et al.  BagMOOV: A novel ensemble for heart disease prediction bootstrap aggregation with multi-objective optimized voting , 2015, Australasian Physical & Engineering Sciences in Medicine.