An incremental node embedding technique for error correcting output codes

The error correcting output codes (ECOC) technique is a useful way to extend any binary classifier to the multiclass case. The design of an ECOC matrix usually considers an a priori fixed number of dichotomizers. We argue that the selection and number of dichotomizers must depend on the performance of the ensemble code in relation to the problem domain. In this paper, we present a novel approach that improves the performance of any initial output coding by extending it in a sub-optimal way. The proposed strategy creates the new dichotomizers by minimizing the confusion matrix among classes guided by a validation subset. A weighted methodology is proposed to take into account the different relevance of each dichotomizer. As a result, overfitting is avoided and small codes with good generalization performance are obtained. In the decoding step, we introduce a new strategy that follows the principle that positions coded with the symbol zero should have small influence in the results. We compare our strategy to other well-known ECOC strategies on the UCI database, and the results show it represents a significant improvement.

[1]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[2]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[3]  Thomas G. Dietterich,et al.  Error-Correcting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs , 1991, AAAI.

[4]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[5]  Reza Ghaderi,et al.  Coding and decoding strategies for multi-class learning problems , 2003, Inf. Fusion.

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Ching Y. Suen,et al.  Unconstrained numeral pair recognition using enhanced error correcting output coding: a holistic approach , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[8]  Jason D. M. Rennie ifile: An Application of Machine Learning to E-Mail Filtering , 2000 .

[9]  Sergio Escalera,et al.  Forest Extension of Error Correcting Output Codes and Boosted Landmarks , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Wolfgang Utschick,et al.  Stochastic Organization of Output Codes in Multiclass Learning Problems , 2001, Neural Computation.

[11]  Jiri Matas,et al.  Face verification using error correcting output codes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[14]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[15]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[16]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[17]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[18]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.