Heart Disease Prediction Using Classification (Naive Bayes)

This paper aims toward a greater idea and utilization of machine learning in the medical sector. In this paper, comparative performances of six classification models are presented, when used over the University of California Irvine’s (UCI) Cleveland Heart Disease Records to predict coronary artery disease (CAD). At first, all the 13 provided independent features were used to build the models. On comparing the accuracy of models, it was found that K-nearest neighbors (KNN), support vector machine (SVM), and Naive Bayes have expected and better performances. Thereafter, feature selection is applied to improve prediction accuracy. The backward elimination method and filter method based on the Pearson correlation coefficient is used to choose major predicting features. The accuracy of models using all features and using features selected significantly enhanced the performance of Naive Bayes and random forest, while the other models did not perform as expected. Naive Bayes produced an accuracy of 88.16% on the test set thereafter.

[1]  Laurie Harris,et al.  Overcoming Small Data Limitations in Heart Disease Prediction by Using Surrogate Data , 2018 .

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Moinul Zaber,et al.  A comparison of three discrete methods for classification of heart disease data , 2015 .

[4]  Usman Qamar,et al.  A Framework for Classifying Unstructured Data of Cardiac Patients: A Supervised Learning Approach , 2016 .

[5]  Bulusu Lakshmana Deekshatulu,et al.  Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm , 2015, ArXiv.

[6]  Sunila Godara,et al.  Comparative Study of Data Mining Classification Methods in Cardiovascular Disease Prediction , 2011 .

[7]  Yeshvendra K. Singh,et al.  Heart Disease Prediction System Using Random Forest , 2016 .

[8]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[9]  Divya Jain,et al.  Feature selection and classification systems for chronic disease prediction: A review , 2018, Egyptian Informatics Journal.

[10]  Zerina Masetic,et al.  Prediction of Heart Diseases Using Majority Voting Ensemble Method , 2019, IFMBE Proceedings.

[11]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[12]  B. L. Deekshatulu,et al.  Classification of Heart Disease using Artificial Neural Network and Feature Subset Selection , 2013 .

[13]  Lilly Suriani Affendey,et al.  Using Feature Selection as accuracy Benchmarking in Clinical Data Mining , 2013, J. Comput. Sci..

[14]  Sanjay Kumar Sen,et al.  Predicting and Diagnosing of Heart Disease Using Machine Learning Algorithms , 2017 .

[15]  R. Detrano,et al.  International application of a new probability algorithm for the diagnosis of coronary artery disease. , 1989, The American journal of cardiology.

[16]  R. Detrano,et al.  Algorithm to Predict Triple‐Vessel/Left Main Coronary Artery Disease in Patients Without Myocardial Infarction: An International Cross Validation Robert , 1991, Circulation.

[17]  Constantinos S. Pattichis,et al.  Classification capacity of a modular neural network implementing neurally inspired architecture and training rules , 2004, IEEE Transactions on Neural Networks.

[18]  Lei Lei,et al.  R-C4.5 decision tree model and its applications to health care dataset , 2005, Proceedings of ICSSSM '05. 2005 International Conference on Services Systems and Services Management, 2005..

[19]  Abdulkadir Sengür,et al.  Effective diagnosis of heart disease through neural networks ensembles , 2009, Expert Syst. Appl..