Pruning of Error Correcting Output Codes by optimization of accuracy–diversity trade off

Ensemble learning is a method of combining learners to obtain more reliable and accurate predictions in supervised and unsupervised learning. However, the ensemble sizes are sometimes unnecessarily large which leads to additional memory usage, computational overhead and decreased effectiveness. To overcome such side effects, pruning algorithms have been developed; since this is a combinatorial problem, finding the exact subset of ensembles is computationally infeasible. Different types of heuristic algorithms have developed to obtain an approximate solution but they lack a theoretical guarantee. Error Correcting Output Code (ECOC) is one of the well-known ensemble techniques for multiclass classification which combines the outputs of binary base learners to predict the classes for multiclass data. In this paper, we propose a novel approach for pruning the ECOC matrix by utilizing accuracy and diversity information simultaneously. All existing pruning methods need the size of the ensemble as a parameter, so the performance of the pruning methods depends on the size of the ensemble. Our unparametrized pruning method is novel as being independent of the size of ensemble. Experimental results show that our pruning method is mostly better than other existing approaches.

[1]  Shaogang Gong,et al.  Sparse Multiscale Local Binary Patterns , 2006, BMVC.

[2]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[3]  Josef Kittler,et al.  ECOC Matrix Pruning Using Accuracy Information , 2013, MCS.

[4]  Eugene M. Kleinberg A Mathematically Rigorous Foundation for Supervised Learning , 2000, Multiple Classifier Systems.

[5]  Gert R. G. Lanckriet,et al.  A majorization-minimization approach to the sparse generalized eigenvalue problem , 2011, Machine Learning.

[6]  Gareth M. James,et al.  Majority vote classifiers: theory and applications , 1998 .

[7]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[8]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[11]  Terry Windeatt,et al.  Facial Action Unit Recognition Using Filtered Local Binary Pattern Features with Bootstrapped and Weighted ECOC Classifiers , 2011, Ensembles in Machine Learning Applications.

[12]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[13]  Sergio Escalera,et al.  On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[15]  Anastasios Tefas,et al.  Optimizing subclass discriminant Error Correcting Output Codes using particle swarm optimization , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[16]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[17]  E. Kleinberg An overtraining-resistant stochastic modeling method for pattern recognition , 1996 .

[18]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[19]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[20]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[23]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[24]  Filippo Menczer,et al.  Meta-evolutionary ensembles , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[25]  Thomas F. Coleman,et al.  Optimization Toolbox User's Guide , 1998 .

[26]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[27]  Gonzalo Martínez-Muñoz,et al.  Using boosting to prune bagging ensembles , 2007, Pattern Recognit. Lett..

[28]  William Nick Street,et al.  Ensemble Pruning Via Semi-definite Programming , 2006, J. Mach. Learn. Res..

[29]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[30]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[31]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Terry Windeatt,et al.  Facial Expression Detection using Filtered Local Binary Pattern Features with ECOC Classifiers and Platt Scaling , 2010, WAPA.

[33]  L. Breiman Arcing Classifiers , 1998 .

[34]  Trevor Hastie,et al.  Error coding and PaCT's , 1997 .

[35]  Reza Ghaderi,et al.  Coding and decoding strategies for multi-class learning problems , 2003, Inf. Fusion.

[36]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[37]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[38]  Stephen P. Boyd,et al.  Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices , 2003, Proceedings of the 2003 American Control Conference, 2003..

[39]  Terry Windeatt,et al.  Accuracy/Diversity and Ensemble MLP Classifier Design , 2006, IEEE Transactions on Neural Networks.

[40]  Gonzalo Martínez-Muñoz,et al.  Pruning in ordered bagging ensembles , 2006, ICML.

[41]  Peter L. Bartlett,et al.  Improved Generalization Through Explicit Optimization of Margins , 2000, Machine Learning.

[42]  Anastasios Tefas,et al.  Optimizing Linear Discriminant Error Correcting Output Codes Using Particle Swarm Optimization , 2011, ICANN.

[43]  Thomas G. Dietterich,et al.  Pruning Adaptive Boosting , 1997, ICML.

[44]  Kaizhu Huang,et al.  Convex ensemble learning with sparsity and diversity , 2014, Inf. Fusion.

[45]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[46]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[47]  Alberto Suárez,et al.  Aggregation Ordering in Bagging , 2004 .

[48]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[49]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[50]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[51]  Kaizhu Huang,et al.  A novel classifier ensemble method with sparsity and diversity , 2014, Neurocomputing.