A study on Error Correcting Output Codes

Recent work points towards advantages in decomposing multi-class decision problems into multiple binary problems. There are several strategies for this decomposition. The most used and studied are all-vs-all, one-vs-all and the error correction output codes (Ecocs). Ecocs appeared in the scope of telecommunications thanks to the capacity to correct transmission errors. This capacity is due to introducing redundancy when codifying messages. Ecocs are binary words and can be adapted to be used in classifications problems. They must, however, respect some specific constraints. The binary words must be further apart as much as possible. Equal or complementary columns cannot exist and no column can be constant (either 1 or 0). Given two Ecocs satisfying these constrains, which one is more appropriate for classification purposes? In this work we suggest a function for evaluating the quality of Ecocs. This function is used to guide the search in the persecution algorithm, a new method to generate Ecocs for classifications purposes. The binary words that form the Ecocs can have several dimensions for the same number of classes that it intends to represent. The growth of these possible dimensions is exponential with the number of classes of the multi-class problem. In this paper we present a method to choose the dimension of the Ecoc that assure a good tradeoff between redundancy and error correction capacity. The method is evaluated in a set of benchmark classification problems. Experimental results are competitive against standard decomposition methods

[1]  Miguel Moreira The use of Boolean concepts in general classification contexts , 2000 .

[2]  Johannes Fürnkranz,et al.  Round robin ensembles , 2003, Intell. Data Anal..

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  Alon Orlitsky,et al.  On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines , 2003, J. Mach. Learn. Res..

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[7]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  An Hybrid GA/SVM Approach for Multiclass Classification with Directed Acyclic Graphs , 2004, SBIA.

[8]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[9]  João Gama,et al.  Forest trees for on-line data , 2004, SAC '04.

[10]  Enrique Alba,et al.  Solving the error correcting code problem with parallel hybrid heuristics , 2004, SAC '04.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[13]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[14]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[15]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[16]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[17]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .