Joint learning of error-correcting output codes and dichotomizers from data

The ECOC technique is a powerful tool to learn and combine multiple binary learners for multi-class classification. It generally involves three steps: coding, dichotomizers learning, and decoding. In previous ECOC methods, the coding step and the dichotomizers learning step are usually performed independently. This simplifies the learning problem but may lead to unsatisfactory decoding results. To solve this problem, we propose a novel model for learning the ECOC matrix and dichotomizers jointly from data. We formulate the model as a nonlinear programming problem and develop an efficient alternating minimization algorithm to solve it. Specifically, for fixed ECOC matrix, our model is decomposed into a group of mutually independent quadratic programming problems; while for fixed dichotomizers, it is a difference of convex functions problem and can be easily solved using the concave--convex procedure algorithm. Our experimental results on ten data sets from the UCI machine learning repository demonstrated the advantage of our model over state-of-the-art ECOC methods.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[4]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[5]  Michael Collins,et al.  Learning Label Embeddings for Nearest-Neighbor Multi-class Classification with an Application to Speech Recognition , 2009, NIPS.

[6]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Sergio Escalera,et al.  An incremental node embedding technique for error correcting output codes , 2008, Pattern Recognit..

[8]  Adam Smith,et al.  Algorithm Design and Analysis , 2008 .

[9]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[10]  Sergio Escalera,et al.  Re-coding ECOCs without re-training , 2010, Pattern Recognit. Lett..

[11]  Wolfgang Utschick,et al.  Stochastic Organization of Output Codes in Multiclass Learning Problems , 2001, Neural Computation.

[12]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[13]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[14]  Michael R. Lyu,et al.  Maxi–Min Margin Machine: Learning Large Margin Classifiers Locally and Globally , 2008, IEEE Transactions on Neural Networks.

[15]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[16]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[17]  Sergio Escalera,et al.  On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[20]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[21]  R. Horst,et al.  DC Programming: Overview , 1999 .

[22]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[23]  Ching Y. Suen,et al.  Data-driven decomposition for multi-class classification , 2008, Pattern Recognit..

[24]  Gert R. G. Lanckriet,et al.  On the Convergence of the Concave-Convex Procedure , 2009, NIPS.

[25]  Hiroshi Sako,et al.  Class-specific feature polynomial classifier for pattern classification and its application to handwritten numeral recognition , 2006, Pattern Recognit..

[26]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[27]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[28]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..