Comparison of Classifiers for the Risk of Diabetes Prediction

Abstract This paper applied a use of algorithms to classify the risk of diabetes mellitus. Four well known classification models that are Decision Tree, Artificial Neural Networks, Logistic Regression and Naive Bayes were first examined. Then, Bagging and Boosting techniques were investigated for improving the robustness of such models. Additionally, Random Forest was not ignored to evaluate in the study. Findings suggest that the best performance of disease risk classification is Random Forest algorithm. Therefore, its model was used to create a web application for predicting a class of the diabetes risk.

[1]  Andrew J. Palmer,et al.  A multivariate logistic regression equation to screen for diabetes. Authors' reply , 2003 .

[2]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[3]  J. Pei,et al.  Data Mining : Concepts and Techniques 3rd edition Ed. 3 , 2011 .

[4]  Punnee Sittidech,et al.  RANDOM FOREST ANALYSIS ON DIABETES COMPLICATION DATA , 2014, RA 2014.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[8]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[9]  Chengqi Zhang,et al.  Empirical Study of Bagging Predictors on Medical Data , 2011, AusDM.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[12]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[13]  I. Maqsood,et al.  Random Forests and Decision Trees , 2012 .

[14]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[15]  Ling Wang,et al.  Evaluating the risk of type 2 diabetes mellitus using artificial neural network: an effective classification approach. , 2013, Diabetes research and clinical practice.

[16]  J. Tebbs,et al.  An Introduction to Categorical Data Analysis , 2008 .

[17]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[18]  Punnee Sittidech,et al.  Ensemble Learning Model for Diabetes Classification , 2014 .

[19]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[20]  A. Agresti An introduction to categorical data analysis , 1997 .

[21]  Fevzullah Temurtas,et al.  A comparative study on diabetes disease diagnosis using neural networks , 2009, Expert Syst. Appl..

[22]  W. Herman,et al.  A multivariate logistic regression equation to screen for diabetes: development and validation. , 2002, Diabetes care.