An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes

Classification problems involving multiple classes can be addressed in different ways. One of the most popular techniques consists in dividing the original data set into two-class subsets, learning a different binary model for each new subset. These techniques are known as binarization strategies. In this work, we are interested in ensemble methods by binarization techniques; in particular, we focus on the well-known one-vs-one and one-vs-all decomposition strategies, paying special attention to the final step of the ensembles, the combination of the outputs of the binary classifiers. Our aim is to develop an empirical analysis of different aggregations to combine these outputs. To do so, we develop a double study: first, we use different base classifiers in order to observe the suitability and potential of each combination within each classifier. Then, we compare the performance of these ensemble techniques with the classifiers' themselves. Hence, we also analyse the improvement with respect to the classifiers that handle multiple classes inherently. We carry out the experimental study with several well-known algorithms of the literature such as Support Vector Machines, Decision Trees, Instance Based Learning or Rule Based Systems. We will show, supported by several statistical analyses, the goodness of the binarization techniques with respect to the base classifiers and finally we will point out the most robust techniques within this framework.

[1]  Eyke Hüllermeier,et al.  Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting , 2010, Pattern Recognit..

[2]  Ashish Anand,et al.  Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates. , 2009, Journal of theoretical biology.

[3]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[4]  Kishan G. Mehrotra,et al.  Efficient classification for multiclass problems using modular neural networks , 1995, IEEE Trans. Neural Networks.

[5]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[6]  Sung-Bae Cho,et al.  Fingerprint classification using one-vs-all support vector machines dynamically ordered with naive Bayes classifiers , 2008, Pattern Recognit..

[7]  Tin Kam Ho,et al.  Domain of competence of XCS classifier system in complexity measurement space , 2005, IEEE Transactions on Evolutionary Computation.

[8]  Francisco Herrera,et al.  A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability , 2009, Soft Comput..

[9]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[10]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[11]  Elif Derya Übeyli,et al.  Multiclass Support Vector Machines for EEG-Signals Classification , 2007, IEEE Transactions on Information Technology in Biomedicine.

[12]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[13]  Françoise Fogelman-Soulié,et al.  Neurocomputing : algorithms, architectures and applications , 1990 .

[14]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[15]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[17]  S. Orlovsky Decision-making with a fuzzy preference relation , 1978 .

[18]  Francisco Herrera,et al.  Genetics-Based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study , 2010, IEEE Transactions on Evolutionary Computation.

[19]  Stefan Kramer,et al.  Ensembles of nested dichotomies for multi-class problems , 2004, ICML.

[20]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[21]  Nicolás García-Pedrajas,et al.  Evolving Output Codes for Multiclass Problems , 2008, IEEE Transactions on Evolutionary Computation.

[22]  Francisco Herrera,et al.  Solving multi-class problems with linguistic fuzzy rule based classification systems based on pairwise learning and preference relations , 2010, Fuzzy Sets Syst..

[23]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[24]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[25]  Chun-Gui Xu,et al.  A genetic programming-based approach to the classification of multiclass microarray datasets , 2009, Bioinform..

[26]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[27]  S. Orlovsky Decision-making with a fuzzy preference relation , 1978 .

[28]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[29]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[30]  Giorgio Valentini,et al.  Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines , 2003, Formal Pattern Analysis & Applications.

[31]  Lale Akarun,et al.  A multi-class classification strategy for Fisher scores: Application to signer independent sign language recognition , 2010, Pattern Recognit..

[32]  Eric C. C. Tsang,et al.  Nesting One-Against-One Algorithm Based on SVMs for Pattern Classification , 2008, IEEE Transactions on Neural Networks.

[33]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A review on the combination of binary classifiers in multiclass problems , 2008, Artificial Intelligence Review.

[34]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[35]  Johannes Fürnkranz,et al.  Round robin ensembles , 2003, Intell. Data Anal..

[36]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[37]  Xiaowei Yang,et al.  Nesting Algorithm for Multi-Classification Problems , 2007, Soft Comput..

[38]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[39]  Ian Witten,et al.  Data Mining , 2000 .

[40]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[41]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[42]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[43]  Yixin Chen,et al.  Support vector learning for fuzzy rule-based classification systems , 2003, IEEE Trans. Fuzzy Syst..

[44]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[46]  Eyke Hüllermeier,et al.  Binary Decomposition Methods for Multipartite Ranking , 2009, ECML/PKDD.

[47]  T. Ho,et al.  Data Complexity in Pattern Recognition , 2006 .

[48]  Paolo Frasconi,et al.  New results on error correcting output codes of kernel machines , 2004, IEEE Transactions on Neural Networks.

[49]  Stephen F. Smith,et al.  Flexible Learning of Problem Solving Heuristics Through Adaptive Search , 1983, IJCAI.

[50]  Eyke Hüllermeier,et al.  Learning valued preference structures for solving classification problems , 2008, Fuzzy Sets Syst..

[51]  Friedhelm Schwenker,et al.  Hierarchical support vector machines for multi-class pattern recognition , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[52]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[53]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[54]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[55]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[56]  David G. Stork,et al.  Pattern Classification , 1973 .

[57]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[58]  B. Fei,et al.  Binary tree of SVM: a new fast multiclass training and classification algorithm , 2006, IEEE Transactions on Neural Networks.

[59]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[60]  Sergio Escalera,et al.  An incremental node embedding technique for error correcting output codes , 2008, Pattern Recognit..

[61]  Arie Ben-David,et al.  A lot of randomness is hiding in accuracy , 2007, Eng. Appl. Artif. Intell..

[62]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[63]  Nicolás García-Pedrajas,et al.  Improving multiclass pattern recognition by the combination of two strategies , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Eyke Hüllermeier,et al.  FR3: A Fuzzy Rule Learner for Inducing Reliable Classifiers , 2009, IEEE Transactions on Fuzzy Systems.

[65]  José Hernández-Orallo,et al.  An experimental comparison of performance measures for classification , 2009, Pattern Recognit. Lett..

[66]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[67]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[68]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[69]  Robert P. W. Duin,et al.  Efficient Multiclass ROC Approximation by Decomposition via Confusion Matrix Perturbation Analysis , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Francisco Herrera,et al.  Domains of competence of fuzzy rule based classification systems with data complexity measures: A case of study using a fuzzy hybrid genetic based machine learning method , 2010, Fuzzy Sets Syst..

[72]  B. John Oommen,et al.  Multi-class pairwise linear dimensionality reduction using heteroscedastic schemes , 2010, Pattern Recognit..

[73]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[74]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[75]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[76]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[77]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[78]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[79]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[80]  Alberto Fernández,et al.  Enhancing Fuzzy Rule Based Systems in Multi-Classification Using Pairwise Coupling with Preference Relations , 2009 .