Classifiers evaluation: Comparison of performance classifiers based on tuples amount

The aim of this study is to compare some classifiers' performance related to the tuples amount. The different metrics of performance has been considered, such as: Accuracy, Mean Absolute Error (MAE), and Kappa Statistic. In this research, the different numbers of tuples are considered as well. The readmission process dataset of Diabetic patients, which has been experimented, consists of 47 features and 49.736 tuples. The methodology of this research starts from preprocessing phase. After that, the clean dataset is divided into 5 subsets which represent every multiple of 10.000 tuples randomly. Each particular subset will be validated by three traditional classifiers i.e. Naive Bayes, K-Nearest Neighbor (k-NN), and Decision Tree. We also implement some setting parameters of each classifier except Naive Bayes. Validation method used in this research is 10-Fold Cross-Validation. As the final conclusion, we compare the performance of classifiers based on the number of tuples. Our study indicates that the more the number of tuples, the lower and weaker the MAE and Accuracy performances whereas the kappa statistic performance tend to be fluctuated. Our study also found that Naive Bayes outperforms k-NN and Decision Tree in overall. The top classifiers performances were reached in a 20.000-tuple evaluation.

[1]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[2]  Hedieh Sajedi,et al.  A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring , 2015 .

[3]  Pooja Mittal,et al.  A COMPARATIVE ANALYSIS OF CLASSIFICATION TECHNIQUES ON MEDICAL DATA SETS , 2014 .

[4]  C. Willmott,et al.  Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance , 2005 .

[5]  Fevzullah Temurtas,et al.  A comparative study on diabetes disease diagnosis using neural networks , 2009, Expert Syst. Appl..

[6]  Daniel J. Rubin Hospital Readmission of Patients with Diabetes , 2015, Current Diabetes Reports.

[7]  H. Morgenstern,et al.  Hemoglobin A1c Levels and Mortality in the Diabetic Hemodialysis Population , 2012, Diabetes Care.

[8]  Rashedur M. Rahman,et al.  Decision Tree and Naïve Bayes Algorithm for Classification and Generation of Actionable Knowledge for Direct Marketing , 2013 .

[9]  S. Upadhyaya,et al.  Comparison of NN and LR classifiers in the context of screening native American elders with diabetes , 2013, Expert Syst. Appl..

[10]  K. Dungan,et al.  The Effect of Diabetes on Hospital Readmissions , 2012, Journal of diabetes science and technology.

[11]  Ayman I. Madbouly,et al.  A Comparative Analysis of Classification Algorithms for Students College Enrollment Approval Using Data Mining , 2014, IDEE '14.

[12]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[13]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[14]  Naeem Ahmed Mahoto,et al.  PERFORMANCE EVALUATION OF CLASSIFICATION METHODS FOR HEART DISEASE DATASET , 2015 .

[15]  Rashedur M. Rahman,et al.  Comparison of Various Classification Techniques Using Different Data Mining Tools for Diabetes Diagnosis , 2013 .

[16]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[17]  Changmin Kim,et al.  Classification of major construction materials in construction environments using ensemble classifiers , 2014, Adv. Eng. Informatics.