Early Prediction of Heart Diseases Using Data Mining Techniques

Largest-ever study of deaths shows heart diseases have emerged as the number one killer in world. About 25 per cent of deaths in the age group of 25- 69 years occur because of heart diseases. If all age groups are included, heart diseases account for about 19 per cent of all deaths. It is the leading cause of death among males as well as females. It is also the leading cause of death in all regions though the numbers vary. The proportion of deaths caused by heart disease is the highest in south India (25 per cent) and lowest - 12 per cent - in the central region of India. The prediction of heart disease survivability has been a challenging research problem for many researchers. Since the early dates of the related research, much advancement has been recorded in several related fields. Therefore, the main objective of this manuscript is to report on a research project where we took advantage of those available technological advancements to develop prediction models for heart disease survivability. We used three popular data mining algorithms CART (Classification and Regression Tree), ID3 (Iterative Dichotomized 3) and decision table (DT) extracted from a decision tree or rule-based classifier to develop the prediction models using a large dataset. We also used 10-fold crossvalidation methods to measure the unbiased estimate.

[1]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[2]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[3]  Hnin Wint Khaing Data mining based fragmentation and prediction of medical data , 2011, 2011 3rd International Conference on Computer Research and Development.

[4]  A. Govardhan,et al.  Analysis of coronary heart disease and prediction of heart attack in coal mining regions using data mining techniques , 2010, 2010 5th International Conference on Computer Science & Education.

[5]  Padhraic Smyth,et al.  Statistical inference and data mining , 1996, CACM.

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[8]  C. Mathers Global Burden of Disease , 2008 .

[9]  Hsin-Mu Tsai,et al.  Mining frequent patterns in image databases with 9D-SPA representation , 2009, J. Syst. Softw..

[10]  Padhraic Smyth,et al.  Statistical inference and data mining : Data mining and knowledge discovery in databases , 1996 .

[11]  B. Thuraisingham A primer for understanding and applying data mining , 2000 .

[12]  Mitica Craus,et al.  Grid implementation of the Apriori algorithm , 2007, Adv. Eng. Softw..

[13]  Soni Jyoti,et al.  Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction , 2011 .

[14]  Barbara Mento,et al.  Data mining and data warehousing , 2003 .

[15]  M Anbarasi,et al.  ENHANCED PREDICTION OF HEART DISEASE WITH FEATURE SUBSET SELECTION USING GENETIC ALGORITHM , 2010 .

[16]  Saurabh Pal,et al.  Data Mining Approach to Detect Heart Diseases , 2014 .

[17]  Surjeet Kumar Yadav,et al.  Data Mining: A Prediction for Performance Improvement of Engineering Students using Classification , 2012, ArXiv.

[18]  Jie Wang,et al.  Combination Data Mining Methods with New Medical Data to Predicting Outcome of Coronary Heart Disease , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.