Performance evaluation of machine learning classification techniques for Diabetes disease

Diabetes is a noncontagious disease where Diabetes of type two Mellitus is among the top five leading the cause of global death. Not knowing the status of patients leads to complications such as kidney neuropathy and retinopathy, eventually lead to death. Knowing the patient’s stand using machine learning techniques can assist in early treatment will be useful in lowering the burdens mentioned above caused by Diabetes. In this work, researchers focused on evaluating the patient’s status of Diabetes. In this study, the Cross-Industry Standard Process for Data mining (CRISP-DM) used as a research methodology of research. Where Support Vector Machine, Decision Tree, Naive Bayes used as a classification technique, the study aims to predict the patient status for optimizing the complication caused by Diabetes. The data set used for the model was retrieved from the Pima Indian diabetic database Diabetes Database (PIDD), which is obtained from the UCI machine learning database with 768 records in total. KNN algorithm can be made best with an accuracy of 76% for the condensed dataset with the nine attributes as identified from the comparison of the result of different models.

[1]  Vinod Sharma,et al.  Performance Based Evaluation of Various Machine Learning Classification Techniques for Chronic Kidney Disease Diagnosis , 2016, ArXiv.

[2]  J. Shaw,et al.  Global and societal implications of the diabetes epidemic , 2001, Nature.

[3]  Francisco Herrera,et al.  kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..

[4]  Andrew P. Bradley,et al.  Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus , 2010, IEEE Transactions on Information Technology in Biomedicine.

[5]  N. Wareham,et al.  Incidence of type 2 diabetes using proposed HbA1c diagnostic criteria in the EPIC-Norfolk cohort: implications for preventive strategies , 2010 .

[6]  I. Vlahavas,et al.  Machine Learning and Data Mining Methods in Diabetes Research , 2017, Computational and structural biotechnology journal.

[7]  S. Kurnaz,et al.  COMPARISON OF DATA MINING TECHNIQUES FOR PREDICTING DIABETES OR PREDIABETES BY RISK FACTORS , 2019 .

[8]  Sasan H. Alizadeh,et al.  Mixture of latent multinomial naive Bayes classifier , 2018, Appl. Soft Comput..

[9]  David R. Musicant,et al.  Understanding Support Vector Machine Classifications via a Recommender System-Like Approach , 2009, DMIN.

[10]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11]  Dharmendra Sharma,et al.  An SVM-Fuzzy Expert System Design For Diabetes Risk Classification , 2015 .

[12]  J. Shaw,et al.  Global estimates of diabetes prevalence for 2013 and projections for 2035. , 2014, Diabetes Research and Clinical Practice.

[13]  P. Zimmet,et al.  Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO Consultation , 1998, Diabetic medicine : a journal of the British Diabetic Association.

[14]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[15]  R. Luben,et al.  Incidence of Type 2 Diabetes Using Proposed HbA1c Diagnostic Criteria in the European Prospective Investigation of Cancer–Norfolk Cohort , 2011, Diabetes Care.

[16]  Chunyan Miao,et al.  A comprehensive exploration to the machine learning techniques for diabetes identification , 2018, 2018 IEEE 4th World Forum on Internet of Things (WF-IoT).

[17]  Rahul Samant,et al.  Performance of SVM Classifiers in Predicting Diabetes , 2013 .

[18]  Dharavath Ramesh,et al.  Ensemble method based predictive model for analyzing disease datasets: a predictive analysis approach , 2019, Health and Technology.