A hybrid prediction model for type 2 diabetes using K-means and decision tree

Type 2 diabetes has a quite high incidence all over the world. For the prevention and treatment of Type 2 diabetes, early detection is demanded. Nowadays, data mining techniques are gaining increasing importance in medical diagnosis field by their classification capability. In this paper, a hybrid prediction model is proposed to help the diagnosis of Type 2 diabetes. In the proposed model, K-means is used for data reduction with J48 decision tree as a classifier for classification. In order to get the experimental result, we used the Pima Indians Diabetes Dataset from UCI Machine Learning Repository. The result shows that the proposed model has reached better accuracy compared to other previous studies that mentioned in the literature. On the basis of the result, it can be proven that the proposed model would be helpful in Type 2 diabetes diagnosis.

[1]  Rashedur M. Rahman,et al.  Comparison of Various Classification Techniques Using Different Data Mining Tools for Diabetes Diagnosis , 2013 .

[2]  Norhaidah Abu Haris,et al.  A study of open-source data mining tools for forecasting , 2015, IMCOM.

[3]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[4]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[5]  Mehmed Kantardzic,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2002 .

[6]  Aida Mustapha,et al.  A Hybrid Model of Hierarchical Clustering and Decision Tree for Rule-based Classification of Diabetic Patients , 2013 .

[7]  Daniel T. Larose,et al.  Data mining methods and models , 2006 .

[8]  Dr.T. Velmurugan,et al.  Efficiency of k-Means and K-Medoids Algorithms for Clustering Arbitrary Data Points , 2012 .

[9]  Muin J. Khoury,et al.  Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes , 2010, BMC Medical Informatics Decis. Mak..

[10]  Tina R. Patil,et al.  Performance Analysis of Naive Bayes and J 48 Classification Algorithm for Data Classification , 2013 .

[11]  Asma A. Al Jarullah Decision tree discovery for the diagnosis of type II diabetes , 2011, 2011 International Conference on Innovations in Information Technology.

[12]  Sonali Agarwal,et al.  Predictive Model for Diabetic Patients using Hybrid Twin Support Vector Machine , 2014 .

[13]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[14]  Aswathy Ravikumar,et al.  Study of Data Mining Algorithms for Prediction and Diagnosis of Diabetes Mellitus , 2014 .

[15]  Hendrik Blockeel,et al.  Efficient Algorithms for Decision Tree Cross-validation , 2001, J. Mach. Learn. Res..

[16]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[17]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[18]  Félix Puchulu,et al.  Definition, Classification and Diagnosis of Diabetes Mellitus , 2018 .

[19]  Guido Freckmann,et al.  Definition, classification and diagnostics of diabetes mellitus , 2018, Experimental and clinical endocrinology & diabetes : official journal, German Society of Endocrinology [and] German Diabetes Association.

[20]  R nbspPatelBrijain,et al.  A Survey on Decision Tree Algorithm for Classification , 2014 .

[21]  S. N. Sivanandam,et al.  Introduction to Data Mining and its Applications , 2006, Studies in Computational Intelligence.

[22]  Mira Kania Sabariah,et al.  Early detection of type II Diabetes Mellitus with random forest and classification and regression tree (CART) , 2014 .

[23]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[24]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[25]  G. Rossi,et al.  Diagnosis and Classification of Diabetes Mellitus The information that follows is based largely on the reports of the Expert Committee on the Diagnosis and Classification of Diabetes (Diabetes Care 20:1183–1197, 1997, and Diabetes Care 26:3160–3167, 2003). , 2008, Diabetes Care.

[26]  S. Balamurali,et al.  Performance Analysis of Classifier Models to Predict Diabetes Mellitus , 2015 .

[27]  K. R. Ananthapadmanabhan,et al.  Prediction of Chances - Diabetic Retinopathy Using Data Mining Classification Techniques , 2014 .

[28]  U. Rajendra Acharya,et al.  Automated Identification of Diabetic Type 2 Subjects with and without Neuropathy Using Wavelet Transform on Pedobarograph , 2008, Journal of Medical Systems.

[29]  N. Bogunovic,et al.  An overview of free software tools for general data mining , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).