A comparative study of multi-class support vector machines in the unifying framework of large margin classifiers: Research Articles

Vapnik's statistical learning theory has mainly been developed for two types of problems: pattern recognition (computation of dichotomies) and regression (estimation of real-valued functions). Only in recent years has multi-class discriminant analysis been studied independently. Extending several standard results, among which a famous theorem by Bartlett, we have derived distribution-free uniform strong laws of large numbers devoted to multi-class large margin discriminant models. The capacity measure appearing in the confidence interval, a covering number, has been bounded from above in terms of a new generalized VC dimension. In this paper, the aforementioned theorems are applied to the architecture shared by all the multi-class SVMs proposed so far, which provides us with a simple theoretical framework to study them, compare their performance and design new machines. Copyright © 2005 John Wiley & Sons, Ltd.

[1]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[2]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[3]  John Shawe-Taylor,et al.  Sample sizes for multiple-output threshold networks , 1991 .

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Yann Guermeur A Simple Unifying Theory of Multi-Class Support Vector Machines , 2002 .

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[10]  Ethem Alpaydin,et al.  Support Vector Machines for Multi-class Classification , 1999, IWANN.

[11]  O. Bousquet Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms , 2002 .

[12]  Hélène Paugam-Moisy,et al.  Estimating the sample complexity of a multi-class discriminant model , 1999 .

[13]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[14]  R. Fletcher Practical Methods of Optimization , 1988 .

[15]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[16]  Shai Ben-David,et al.  Characterizations of learnability for classes of {O, …, n}-valued functions , 1992, COLT '92.

[17]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[18]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[19]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[20]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  A. Elisseeff,et al.  Margin Error and Generalization Capabilities of Multi-Class Discriminant Systems , 2000 .

[23]  Yann Guermeur,et al.  Combining Discriminant Models with New Multi-Class SVMs , 2002, Pattern Analysis & Applications.

[24]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[25]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[26]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[27]  Gianluca Pollastri,et al.  Combining protein secondary structure prediction models with ensemble methods of optimal complexity , 2004, Neurocomputing.

[28]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[29]  Bernhard Schölkopf,et al.  Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators , 1998 .

[30]  B. Natarajan On learning sets and functions , 2004, Machine Learning.

[31]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[32]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[33]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[34]  B. Carl,et al.  Entropy, Compactness and the Approximation of Operators , 1990 .

[35]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[36]  Bernhard Schölkopf,et al.  Entropy Numbers of Linear Function Classes , 2000, COLT.

[37]  Martin Anthony,et al.  Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants , 1994 .

[38]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[39]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[40]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[41]  Leonid Gurvits A note on a scale-sensitive dimension of linear bounded functionals in Banach spaces , 2001, Theor. Comput. Sci..

[42]  Hélène Paugam-Moisy,et al.  A new multi-class SVM based on a uniform convergence result , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[43]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[44]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[45]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[46]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[47]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..