A two dimensional accuracy-based measure for classification performance

Abstract Accuracy has been used traditionally to evaluate the performance of classifiers. However, it is well known that accuracy is not able to capture all the different factors that characterize the performance of a multiclass classifier. In this manuscript, accuracy is studied and analyzed as a weighted average of the classification rate of each class. This perspective allows us to propose the dispersion of the classification rate of each class as its complementary measure. In this sense, a graphical performance metric, which is defined in a two dimensional space composed by accuracy and dispersion, is proposed to evaluate the performance of classifiers. We show that the combined values of accuracy and dispersion must fall within a clearly bounded two dimensional region, different for each problem. The nature of this region depends only on the a priori probability of each class, and not on the classifier used. Thus, the performance of multiclassifiers is represented in a two dimensional space where the models can be compared in a more fair manner, providing greater awareness of the strategies that are more accurate when trying to improve the performance of a classifier. Furthermore we experimentally analyze the behavior of seven different performance metrics based on the computation of the confusion matrix values in several scenarios, identifying clusters and relationships between measures. As shown in the experimentation, the graphical metric proposed is specially suitable in challenging, highly imbalanced and with a high number of classes datasets. The approach proposed is a novel point of view to address the evaluation of multiclassifiers and it is an alternative to other evaluation measures used in machine learning.

[1]  Peter A. Flach,et al.  Decision Support for Data Mining , 2003 .

[2]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[3]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[4]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[5]  Lalit Kumar,et al.  Comparative assessment of the measures of thematic classification accuracy , 2007 .

[6]  Eyke Hüllermeier,et al.  On the bayes-optimality of F-measure maximizers , 2013, J. Mach. Learn. Res..

[7]  Pedro Antonio Gutiérrez,et al.  A two-stage evolutionary algorithm based on sensitivity and accuracy for multi-class problems , 2012, Inf. Sci..

[8]  J. Carlin,et al.  Bias, prevalence and kappa. , 1993, Journal of clinical epidemiology.

[9]  Robert P. W. Duin,et al.  Approximating the multiclass ROC by pairwise analysis , 2007, Pattern Recognit. Lett..

[10]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[11]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[12]  Pedro Antonio Gutiérrez,et al.  Sensitivity Versus Accuracy in Multiclass Problems Using Memetic Pareto Evolutionary Neural Networks , 2010, IEEE Transactions on Neural Networks.

[13]  Edwin P. D. Pednault,et al.  Segmentation-based modeling for advanced targeted marketing , 2001, KDD '01.

[14]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[15]  Nathalie Japkowicz,et al.  Visualizing Classifier Performance on Different Domains , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[16]  George Hripcsak,et al.  Analysis of Variance of Cross-Validation Estimators of the Generalization Error , 2005, J. Mach. Learn. Res..

[17]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[18]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[19]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[21]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[22]  Francisco Herrera,et al.  Ordering-based pruning for improving the performance of ensembles of classifiers in the framework of imbalanced datasets , 2016, Inf. Sci..

[23]  Pedro Antonio Gutiérrez,et al.  A dynamic over-sampling procedure based on sensitivity for multi-class problems , 2011, Pattern Recognit..

[24]  Hui Xiong,et al.  COG: local decomposition for rare class analysis , 2010, Data Mining and Knowledge Discovery.

[25]  Pedro Antonio Gutiérrez,et al.  A preliminary study of ordinal metrics to guide a multi-objective evolutionary algorithm , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[26]  Jaime S. Cardoso,et al.  The unimodal model for the classification of ordinal data , 2008, Neural Networks.

[27]  Jonathan E. Fieldsend,et al.  Multi-class ROC analysis from a multi-objective optimisation perspective , 2006, Pattern Recognit. Lett..

[28]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[29]  Peter A. Flach,et al.  Brier Curves: a New Cost-Based Visualisation of Classifier Performance , 2011, ICML.

[30]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[31]  L. A. Goodman,et al.  Measures of association for cross classifications , 1979 .

[32]  César Hervás-Martínez,et al.  Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity , 2011, Expert Syst. Appl..

[33]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[34]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[35]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[36]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[37]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[38]  Guang-Bin Huang,et al.  Extreme learning machine: a new learning scheme of feedforward neural networks , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[39]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .