Multiclass LS-SVMs: Moderated Outputs and Coding-Decoding Schemes

A common way of solving the multiclass categorization problem is to reformulate the problem into a set of binary classification problems. Discriminative binary classifiers like, e.g., Support Vector Machines (SVMs), directly optimize the decision boundary with respect to a certain cost function. In a pragmatic and computationally simple approach, Least Squares SVMs (LS-SVMs) are inferred by minimizing a related regression least squares cost function. The moderated outputs of the binary classifiers are obtained in a second step within the evidence framework. In this paper, Bayes' rule is repeatedly applied to infer the posterior multiclass probabilities, using the moderated outputs of the binary plug-in classifiers and the prior multiclass probabilities. This Bayesian decoding motivates the use of loss function based decoding instead of Hamming decoding. For SVMs and LS-SVMs with linear kernel, experimental evidence suggests the use of one-versus-one coding. With a Radial Basis Function kernel one-versus-one and error correcting output codes yield the best performances, but simpler codings may still yield satisfactory results.

[1]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[2]  Johan A. K. Suykens,et al.  Bayesian Framework for Least-Squares Support Vector Machine Classifiers, Gaussian Processes, and Kernel Fisher Discriminant Analysis , 2002, Neural Computation.

[3]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[6]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Johan A. K. Suykens,et al.  Financial time series prediction using least squares support vector machines within the evidence framework , 2001, IEEE Trans. Neural Networks.

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[12]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[13]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[14]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[15]  Christopher K. I. Williams Prediction with Gaussian Processes: From Linear Regression to Linear Prediction and Beyond , 1999, Learning in Graphical Models.

[16]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[17]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[18]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[19]  Johan A. K. Suykens,et al.  Multiclass least squares support vector machines , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[20]  James T. Kwok,et al.  The evidence framework applied to support vector machines , 2000, IEEE Trans. Neural Networks Learn. Syst..

[21]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[22]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[24]  James T. Kwok Moderating the outputs of support vector machine classifiers , 1999, IEEE Trans. Neural Networks.

[25]  Wolfgang Utschick,et al.  A Regularization Method for Non-Trivial Codes in Polychotomous Classifications , 1998, Int. J. Pattern Recognit. Artif. Intell..