Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach

Abstract Coronary artery disease (CAD) is a leading cause of death worldwide and is associated with high healthcare expenditure. Researchers are motivated to apply machine learning (ML) for quick and accurate detection of CAD. The performance of the automated systems depends on the quality of features used. Clinical CAD datasets contain different features with varying degrees of association with CAD. To extract such features, we developed a novel hybrid feature selection algorithm called heterogeneous hybrid feature selection (2HFS). In this work, we used Nasarian CAD dataset, in which work place and environmental features are also considered, in addition to other clinical features. Synthetic minority over-sampling technique (SMOTE) and Adaptive synthetic (ADASYN) are used to handle the imbalance in the dataset. Decision tree (DT), Gaussian Naive Bayes (GNB), Random Forest (RF), and XGBoost classifiers are used. 2HFS-selected features are then input into these classifier algorithms. Our results show that, the proposed feature selection method has yielded the classification accuracy of 81.23% with SMOTE and XGBoost classifier. We have also tested our approach with other well-known CAD datasets: Hungarian dataset, Long-beach-va dataset, and Z-Alizadeh Sani dataset. We have obtained 83.94%, 81.58% and 92.58% for Hungarian dataset, Long-beach-va dataset, and Z-Alizadeh Sani dataset, respectively. Hence, our experimental results confirm the effectiveness of our proposed feature selection algorithm as compared to the existing state-of-the-art techniques which yielded outstanding results for the development of automated CAD systems.

[1]  Vikram Pudi,et al.  Class Based Weighted K-Nearest Neighbor over Imbalance Dataset , 2013, PAKDD.

[2]  J. Vijayashree,et al.  A Machine Learning Framework for Feature Selection in Heart Disease Classification Using Improved Particle Swarm Optimization with Support Vector Machine Classifier , 2019, Programming and Computer Software.

[3]  Rajendra Prasad Mahapatra,et al.  Taylor and Gradient Descent-Based Actor Critic Neural Network for the Classification of Privacy Preserved Medical Data , 2019, Big Data.

[4]  Vehbi Cagri Gungor,et al.  Diagnosis of Coronary Heart Disease via Classification Algorithms and a New Feature Selection Methodology , 2019 .

[5]  Metin Akay,et al.  Noninvasive diagnosis of coronary artery disease using a neural network algorithm , 1993, Biological Cybernetics.

[6]  C. Held,et al.  Psychosocial stress and major cardiovascular events in patients with stable coronary heart disease , 2018, Journal of internal medicine.

[7]  Qiang Guan,et al.  APPLICATION OF ENSEMBLE ALGORITHM INTEGRATING MULTIPLE CRITERIA FEATURE SELECTION IN CORONARY HEART DISEASE DETECTION , 2017 .

[8]  Neelu Khare,et al.  An Efficient System for Heart Disease Prediction Using Hybrid OFBAT with Rule-Based Fuzzy Logic Model , 2017, J. Circuits Syst. Comput..

[9]  Omar H. Karam,et al.  Feature Analysis of Coronary Artery Heart Disease Data Sets , 2015 .

[10]  G. Tholkappia Arasu,et al.  Rough Set Theory and Fuzzy Logic Based Warehousing of Heterogeneous Clinical Databases , 2017, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Raimund Erbel,et al.  Perceived job insecurity as a risk factor for incident coronary heart disease: systematic review and meta-analysis , 2013, BMJ.

[12]  Agnieszka Wosiak,et al.  Integrating Correlation-Based Feature Selection and Clustering for Improved Cardiovascular Disease Diagnosis , 2018, Complex..

[13]  Ümit Kiliç,et al.  Feature Selection with Artificial Bee Colony Algorithm on Z-Alizadeh Sani Dataset , 2018, 2018 Innovations in Intelligent Systems and Applications Conference (ASYU).

[14]  Sanjay Kumar Dubey,et al.  Sudden Cardiac Arrest Prediction Using Predictive Analytics , 2017 .

[15]  Jafar Habibi,et al.  A data mining approach for diagnosis of coronary artery disease , 2013, Comput. Methods Programs Biomed..

[16]  U. Rajendra Acharya,et al.  NE-nu-SVC: A New Nested Ensemble Clinical Decision Support System for Effective Diagnosis of Coronary Artery Disease , 2019, IEEE Access.

[17]  Jafar Habibi,et al.  Diagnosis of Coronary Artery Disease Using Data Mining Based on Lab Data and Echo Features , 2012, Journal of Medical and Bioengineering.

[18]  Md. Shah Jalal Performance Evaluation of Machine Learning Algorithms for Coronary Artery Disease Features , 2019 .

[19]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[20]  Jafar Habibi,et al.  Diagnosis of Coronary Artery Disease Using Data Mining Techniques Based on Symptoms and ECG Features , 2012 .

[21]  M. A. H. Akhand,et al.  Genetic algorithm based fuzzy decision support system for the diagnosis of heart disease , 2016, 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV).

[22]  Jafar Habibi,et al.  Coronary artery disease detection using computational intelligence methods , 2016, Knowl. Based Syst..

[23]  Azam Dekamin,et al.  A Data Mining Approach for Coronary Artery Disease Prediction in Iran , 2017 .

[24]  Robert M. Nishikawa,et al.  Computer-aided Detection and Diagnosis , 2010 .

[25]  Grazyna Bochenek,et al.  The Relationship of Metabolic Syndrome with Stress, Coronary Heart Disease and Pulmonary Function - An Occupational Cohort-Based Study , 2015, PloS one.

[26]  Sangeet Srivastava,et al.  A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data , 2016, Journal of Medical Systems.

[27]  Amir Lerman,et al.  Association Between Work‐Related Stress and Coronary Heart Disease: A Review of Prospective Studies Through the Job Strain, Effort‐Reward Balance, and Organizational Justice Models , 2018, Journal of the American Heart Association.

[28]  H. Parveen Sultana,et al.  Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification , 2019, Journal of Ambient Intelligence and Humanized Computing.

[29]  K. AnoojP.,et al.  Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules , 2012, J. King Saud Univ. Comput. Inf. Sci..

[30]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[31]  Andrew Steptoe,et al.  Effects of stress on the development and progression of cardiovascular disease , 2018, Nature Reviews Cardiology.

[32]  Hedieh Sajedi,et al.  Prediction of disease based on prescription using data mining methods , 2019 .

[33]  Joel E. W. Koh,et al.  Entropies for automated detection of coronary artery disease using ECG signals: A review , 2018 .

[34]  Muhammad Awais,et al.  Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines , 2018, Knowledge and Information Systems.

[35]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[36]  Anne E Price Heart disease and work , 2004, Heart.

[37]  H. Mannila,et al.  Data mining: machine learning, statistics, and databases , 1996, Proceedings of 8th International Conference on Scientific and Statistical Data Base Management.

[38]  Roohallah Alizadehsani,et al.  Exerting Cost-Sensitive and Feature Creation Algorithms for Coronary Artery Disease Diagnosis , 2012, Int. J. Knowl. Discov. Bioinform..

[39]  S. P. Shantharajah,et al.  An optimized feature selection based on genetic approach and support vector machine for heart disease , 2018, Cluster Computing.

[40]  Moloud Abdar,et al.  Using Decision Trees in Data Mining for Predicting Factors Influencing of Heart Disease , 2015 .

[41]  Cheryl R. Clark,et al.  Financial Stress and Risk of Coronary Heart Disease in the Jackson Heart Study. , 2019, American journal of preventive medicine.

[42]  Roohallah Alizadehsani,et al.  Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm , 2017, Comput. Methods Programs Biomed..

[43]  Asma Ghandeharioun,et al.  Diagnosis of Coronary Arteries Stenosis Using Data Mining , 2012, Journal of medical signals and sensors.

[44]  Onur Osman,et al.  A novel method for pulmonary embolism detection in CTA images , 2014, Comput. Methods Programs Biomed..

[45]  Shusaku Tsumoto,et al.  Communications and Discoveries from Multidisciplinary Data , 2008, Communications and Discoveries from Multidisciplinary Data.

[46]  Wei Ding,et al.  Learning weighted distance metric from group level information and its parallel implementation , 2016, Applied Intelligence.

[47]  Rainer Goebel,et al.  Fast Gaussian Naïve Bayes for searchlight classification analysis , 2017, NeuroImage.

[48]  P. K. Anooj,et al.  Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules , 2012, J. King Saud Univ. Comput. Inf. Sci..

[49]  Hamido Fujita,et al.  Computer Aided detection for fibrillations and flutters using deep convolutional neural network , 2019, Inf. Sci..

[50]  T. Tamura,et al.  27th Annual Inter national Conference of the IEEE Engineering in Medicine and Biology Society , 2005 .

[51]  U. Rajendra Acharya,et al.  A new machine learning technique for an accurate diagnosis of coronary artery disease , 2019, Comput. Methods Programs Biomed..

[52]  Kasturi Dewi Varathan,et al.  Identification of significant features and data mining techniques in predicting heart disease , 2019, Telematics Informatics.

[53]  Ying-Tsang Lo,et al.  PREDICTION OF CORONARY ARTERY DISEASE BASED ON ENSEMBLE LEARNING APPROACHES AND CO-EXPRESSED OBSERVATIONS , 2016 .

[54]  F. Kobayashi,et al.  Job Stress and Stroke and Coronary Heart Disease , 2004 .

[55]  Moloud Abdar,et al.  Using PSO Algorithm for Producing Best Rules in Diagnosis of Heart Disease , 2017, 2017 International Conference on Computer and Applications (ICCA).

[56]  Andrea Esuli,et al.  How Data Mining and Machine Learning Evolved from Relational Data Base to Data Science , 2018, A Comprehensive Guide Through the Italian Database Research.

[57]  J. Vijayashree,et al.  Heart disease classification using hybridized Ruzzo-Tompa memetic based deep trained Neocognitron neural network , 2020, Health and Technology.

[58]  Roohallah Alizadehsani,et al.  Diagnosis of Coronary Artery Disease Using Cost-Sensitive Algorithms , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[59]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[60]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[61]  Donghong Ji,et al.  Novel framework for image attribute annotation with gene selection XGBoost algorithm and relative attribute model , 2019, Appl. Soft Comput..

[62]  Douglas K. S. Ng,et al.  An image feature approach for computer-aided detection of ischemic stroke , 2011, Comput. Biol. Medicine.

[63]  Moloud Abdar,et al.  A Novel Effective Ensemble Model for Early Detection of Coronary Artery Disease , 2019 .

[64]  Ali Cüvitoğlu,et al.  Classification of CAD dataset by using principal component analysis and machine learning approaches , 2018, 2018 5th International Conference on Electrical and Electronic Engineering (ICEEE).