Cross-validation Metrics for Evaluating Classification Performance on Imbalanced Data

Imbalanced data was often a classification issue, because a training process using the data would make model too suitable for the majority class. Meanwhile, ensemble technique was one alternative to deal with imbalanced data. The paper aimed to compare metrics, measuring classification performance for imbalanced data through an empirical study on cabbage image classification. Metrics used were accuracy, F1 score, g-mean, MCC, Cohen’s Kappa statistics, and AUC. We used three ensembles i.e. bagging, Breiman boosting, and Freund boosting. The empirical study result indicated that accuracy, F1 score, and g-mean gave values not reflecting the actual confusion cases. Accuracy, F1 score, g-mean, MCC, and Kappa showed the same values in different confusion matrix conditions, but AUC gave the different values in different confusion matrix. Based on the result, AUC become the robust metrics to measure on imbalanced condition.

[1]  L. Breiman Arcing Classifiers , 1998 .

[2]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  P. Babaghorbani,et al.  Sonography Images for Breast Cancer Texture Classification in Diagnosis of Malignant or Benign Tumors , 2010, 2010 4th International Conference on Bioinformatics and Biomedical Engineering.

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[7]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[8]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[9]  Zhihua Cai,et al.  Evaluation Measures of the Classification Performance of Imbalanced Data Sets , 2009 .

[10]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[11]  Fitri Utaminingrum,et al.  Mammogram Breast Cancer Classification Using Gray-Level Co-Occurrence Matrix and Support Vector Machine , 2018, 2018 International Conference on Sustainable Information Engineering and Technology (SIET).

[12]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[13]  Matías Gámez,et al.  adabag: An R Package for Classification with Boosting and Bagging , 2013 .