An empirical study of binary classifier fusion methods for multiclass classification

One of the most important topics in information fusion is the combination of individual classifiers in multi-classifier systems. We have two different tasks in this area: one is the training and construction of ensembles of classifiers, with each one being able to solve the multiclass problem; the other task is the fusion of binary classifiers, with each one solving a different two-class problem to construct a multiclass classifier. This paper is devoted to the study of several aspects on the fusion process of binary classifiers to obtain a multiclass classifier. In the general case of a classification problem with more than two classes, we are faced with the issue that many algorithms either work better with two-class problems or are specifically designed for two-class problems. In such cases, a binarization method that maps the multiclass problem into several two-class problems must be used. In this task, information fusion plays a central role because of the combination of the prediction of the different binary classifiers into a multiclass classifier. Several issues regarding the way binary learners are trained and combined are raised by this task. Issues such as individual accuracy, diversity, and independence are common to other information fusion tasks such as the construction of ensembles of classifiers. This paper presents a study of the different class binarization methods for the various standard multiclass classification problems that have been proposed while addressing aspects not considered in previous works. We are especially concerned with many of the general assumptions in the field that have not been fully assessed by experimentation. We test the different methods in a large set of real-world problems from the UCI Machine Learning Repository, and we use six different base learners. Our results corroborate some of the previous results present in the literature. Furthermore, we present new results regarding the influence of the base learner on the performance of each method. We also show new results on the behavior of binary testing error and the independence of binary classifiers depending on the coding strategy. Finally, we study the behavior of the methods when the number of classes is high and in the presence of noise.

[1]  Zengyou He,et al.  A cluster ensemble method for clustering categorical data , 2005, Information Fusion.

[2]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[3]  Jason D. M. Rennie,et al.  Improving Multiclass Text Classification with the Support Vector Machine , 2001 .

[4]  Jason Weston,et al.  Multi-class Protein Classification Using Adaptive Codes , 2007, J. Mach. Learn. Res..

[5]  Reza Ghaderi,et al.  Coding and decoding strategies for multi-class learning problems , 2003, Inf. Fusion.

[6]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[7]  Nicolás García-Pedrajas,et al.  Improving multiclass pattern recognition by the combination of two strategies , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[9]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[10]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[11]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[12]  Larry J. Eshelman,et al.  The CHC Adaptive Search Algorithm: How to Have Safe Search When Engaging in Nontraditional Genetic Recombination , 1990, FOGA.

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[15]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[16]  Terry Windeatt,et al.  Diversity measures for multiple classifier system analysis and design , 2004, Inf. Fusion.

[17]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[18]  Kishan G. Mehrotra,et al.  Efficient classification for multiclass problems using modular neural networks , 1995, IEEE Trans. Neural Networks.

[19]  Giorgio Valentini,et al.  Effectiveness of Error Correcting Output Codes in Multiclass Learning Problems , 2000, Multiple Classifier Systems.

[20]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[21]  N. Garc'ia-Pedrajas,et al.  CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features , 2005, J. Artif. Intell. Res..

[22]  Giorgio Valentini,et al.  Quantitative evaluation of dependence among outputs in ECOC classifiers using mutual information based measures , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[23]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[24]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[25]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[26]  Nicolás García-Pedrajas,et al.  Evolving Output Codes for Multiclass Problems , 2008, IEEE Transactions on Evolutionary Computation.

[27]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[28]  Trevor Hastie,et al.  The Error Coding Method and PICTs , 1998 .

[29]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Robert Sabourin,et al.  Overfitting cautious selection of classifier ensembles with genetic algorithms , 2009, Inf. Fusion.

[31]  Ludmila I. Kuncheva Using diversity measures for generating error-correcting output codes in classifier ensembles , 2005, Pattern Recognit. Lett..

[32]  Eddy Mayoraz,et al.  Improved Pairwise Coupling Classification with Correcting Classifiers , 1998, ECML.

[33]  Paolo Frasconi,et al.  New results on error correcting output codes of kernel machines , 2004, IEEE Transactions on Neural Networks.

[34]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[35]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[36]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[37]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[38]  G. Yule,et al.  On the association of attributes in statistics, with examples from the material of the childhood society, &c , 1900, Proceedings of the Royal Society of London.

[39]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[40]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[41]  Reza Ghaderi,et al.  Binary labelling and decision-level fusion , 2001, Inf. Fusion.

[42]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[43]  Sergio Escalera,et al.  Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Dwijendra K. Ray-Chaudhuri,et al.  Binary mixture flow with free energy lattice Boltzmann methods , 2022, arXiv.org.

[45]  Sergio Escalera,et al.  An incremental node embedding technique for error correcting output codes , 2008, Pattern Recognit..

[46]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[47]  Ying Yang,et al.  A comparative study of discretization methods for naive-Bayes classifiers , 2002 .

[48]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[49]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[50]  G. Yule On the Association of Attributes in Statistics: With Illustrations from the Material of the Childhood Society, &c , 1900 .

[51]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[52]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[53]  Giorgio Valentini,et al.  Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines , 2003 .

[54]  Nima Hatami,et al.  Error Correcting Output Codes Using Genetic Algorithm-Based Decoding , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.