Constraint Classification: A New Approach to Multiclass Classification

In this paper, we present a new view of multiclass classification and introduce the constraint classification problem, a generalization that captures many flavors of multiclass classification. We provide the first optimal, distribution independent bounds for many multiclass learning algorithms, including winner-take-all (WTA). Based on our view, we present a learning algorithm that learns via a single linear classifier in high dimension. In addition to the distribution independent bounds, we provide a simple margin-based analysis improving generalization bounds for linear multiclass support vector machines.

[1]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[2]  Dan Roth,et al.  Part of Speech Tagging Using a Network of Linear Separators , 1998, ACL.

[3]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[6]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[7]  H. Sebastian Seung,et al.  Unsupervised Learning by Convex and Conic Coding , 1996, NIPS.

[8]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[9]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[10]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[11]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[12]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[13]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[14]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[15]  Z. Yao A review of: “Fundamentals of Interfacial Engineering” by R. J. Stokes and D. F. Evans Wiley-VCH 605 Third Avenue New York, NY 10158-0012 , 1998 .

[16]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[19]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[21]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[22]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[23]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[24]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..