Erratum: Constructing Multiclass Learners from Binary Learners: A Simple Black-Box Analysis of the Generalization Errors

Multiclass learning is widely solved by reducing to a set of binary problems. By considering base binary classifiers as black boxes, we analyze generalization errors of various constructions, including Max-Win, Decision Directed Acyclic Graphs, Adaptive Directed Acyclic Graphs, and the unifying approach based on coding matrix with Hamming decoding of Allwein, Schapire, and Singer, using only elementary probabilistic tools. Many of these bounds are new, some are much simpler than previously known. This technique also yields a simple proof of the equivalences of the learnability and polynomial-learnability of the multiclass problem and the induced pairwise problems.

[1]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[2]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[3]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[4]  John Shawe-Taylor,et al.  A framework for structural risk minimisation , 1996, COLT '96.

[5]  Dan Roth,et al.  Constraint Classification: A New Approach to Multiclass Classification , 2002, ALT.

[6]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[7]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[8]  Boonserm Kijsirikul,et al.  Adaptive Directed Acyclic Graphs for Multiclass Classification , 2002, PRICAI.

[9]  Boonserm Kijsirikul,et al.  Reordering adaptive directed acyclic graphs: an improved algorithm for multiclass support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[10]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[11]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[12]  H. Paugam-Moisy,et al.  Generalization performance of multiclass discriminant models , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[13]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[14]  Mitsuru Ishizuka,et al.  PRICAI 2002: Trends in Artificial Intelligence , 2002, Lecture Notes in Computer Science.

[15]  Alon Orlitsky,et al.  On Nearest-Neighbor Error-Correcting Output Codes with Application to All-Pairs Multiclass Support Vector Machines , 2003, J. Mach. Learn. Res..

[16]  Thomas G. Dietterich,et al.  Error-Correcting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs , 1991, AAAI.

[17]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[18]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[19]  Nello Cristianini,et al.  Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[22]  B. Natarajan On learning sets and functions , 2004, Machine Learning.

[23]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[25]  Daphna Weinshall,et al.  Learning with Equivalence Constraints and the Relation to Multiclass Learning , 2003, COLT.

[26]  Wanchai Rivepiboon,et al.  Reordering Adaptive Directed Acyclic Graphs for Multiclass Support Vector Machines , 2003, J. Adv. Comput. Intell. Intell. Informatics.