Predictive and descriptive analysis for heart disease diagnosis

The heart disease describes a range of conditions affecting our heart. It can include blood vessel diseases such as coronary artery disease, heart rhythm problems or and heart defects. This term is often used for cardiovascular disease, i.e. narrowed or blocked blood vessels leading to a heart attack, chest pain or stroke. In our work, we analysed three available data sets: Heart Disease Database, South African Heart Disease and Z-Alizadeh Sani Dataset. For this purpose, we focused on two directions: a predictive analysis based on Decision Trees, Naive Bayes, Support Vector Machine and Neural Networks; descriptive analysis based on association and decision rules. Our results are plausible, in some cases comparable or better as in other related works.

[1]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  B. L. Welch ON THE COMPARISON OF SEVERAL MEAN VALUES: AN ALTERNATIVE APPROACH , 1951 .

[4]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[5]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[6]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[7]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[8]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[11]  Peter Drot,et al.  COMPARATIVE STUDY OF MACHINE LEARNING TECHNIQUES FOR SUPERVISED CLASSIFICATION OF BIOMEDICAL DATA , 2014 .

[12]  Sangeet Srivastava,et al.  A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data , 2016, Journal of Medical Systems.

[13]  Babak Shahbaba Biostatistics with R: An Introduction to Statistics Through Biological Data , 2011 .

[14]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[15]  Nilima P. Patil,et al.  Comparison of C5.0 & CART Classification algorithms using pruning technique , 2012 .

[16]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[17]  J. Rossouw,et al.  Coronary risk factor screening in three rural communities. The CORIS baseline study. , 1983, South African medical journal = Suid-Afrikaanse tydskrif vir geneeskunde.

[18]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[19]  Gustavo E. A. P. A. Batista,et al.  A Study of K-Nearest Neighbour as an Imputation Method , 2002, HIS.

[20]  Omar H. Karam,et al.  Feature Analysis of Coronary Artery Heart Disease Data Sets , 2015 .

[21]  HippJochen,et al.  Algorithms for association rule mining a general survey and comparison , 2000 .

[22]  Jafar Habibi,et al.  A data mining approach for diagnosis of coronary artery disease , 2013, Comput. Methods Programs Biomed..

[23]  Chetana Yadav,et al.  Predictive Analysis for the Diagnosis of Coronary Artery Disease using Association Rule Mining , 2014 .

[24]  Roohallah Alizadehsani,et al.  Diagnosis of Coronary Artery Disease Using Cost-Sensitive Algorithms , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[25]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.