Naive Bayes for statlog heart database with consideration of data specifics

Heart disease belongs to one of the main reasons for mortality nowadays and it is expected to become worse due to factors such as aging, diabetes and obesity. In addition, existing misdiagnosis of patients reporting heart related ailment worsens this situation even further. In the paper, a probability approach to recognition of heart disease is analyzed with the employment of Naive Bayes on Statlog Heart Database and with the search of data preprocessing techniques for its improvement. A discretization algorithm of numerical attributes which takes the specifics of given heart disease patients into account is presented. It is based on supervised discretization with consideration of Equal Frequency Discretization. Experiments making use of 10-fold cross-validation show improvements of accuracy which are measured with sensitivity, specificity and their sum and the results are also compared with other classification algorithms.

[1]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[2]  Simge Ekız,et al.  Comparative study of heart disease classification , 2017, 2017 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT).

[3]  Mohammad Shorif Uddin,et al.  Analysis of data mining techniques for heart disease prediction , 2016, 2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT).

[4]  Mohammad Azzeh,et al.  A Comparative Study for Predicting Heart Diseases Using Data Mining Classification Methods , 2017, ArXiv.

[5]  William Peters,et al.  The next generation of clinical decision support: linking evidence to best practice. , 2002, Journal of healthcare information management : JHIM.

[6]  Ms. Ishtake " Intelligent Heart Disease Prediction System Using Data Mining Techniques " , .

[7]  David Couper,et al.  Race and Sex Differences in the Incidence and Prognostic Significance of Silent Myocardial Infarction in the Atherosclerosis Risk in Communities (ARIC) Study , 2016, Circulation.

[8]  F. Cappuccio,et al.  Cardiovascular disease and hypertension in sub-Saharan Africa: burden, risk and interventions , 2016, Internal and Emergency Medicine.

[9]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[10]  N. Choudhry,et al.  The Burden of Cardiovascular Disease in Low- and Middle-Income Countries: Epidemiology and Management. , 2015, The Canadian journal of cardiology.

[11]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  N. K. Salma Banu,et al.  Prediction of heart disease at early stage using data mining and big data analytics: A survey , 2016, 2016 International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT).

[14]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[15]  M. Pletcher,et al.  Screening for Asymptomatic Coronary Artery Disease , 2003 .

[16]  Abdulkader Helwan,et al.  Neural network diagnosis of heart disease , 2015, 2015 International Conference on Advances in Biomedical Engineering (ICABME).