Bayesian Multicategory Support Vector Machines

We show that the multi-class support vector machine (MSVM) proposed by Lee et. al. (2004), can be viewed as a MAP estimation procedure under an appropriate probabilistic interpretation of the classifier. We also show that this interpretation can be extended to a hierarchical Bayesian architecture and to a fully-Bayesian inference procedure for multi-class classification based on data augmentation. We present empirical results that show that the advantages of the Bayesian formalism are obtained without a loss in classification accuracy.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[3]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[4]  R. Muirhead Aspects of Multivariate Statistical Theory , 1982, Wiley Series in Probability and Statistics.

[5]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[6]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[7]  R. Arellano-Valle,et al.  On some characterizations of the t-distribution , 1995 .

[8]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[11]  A. Rukhin Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[12]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[13]  B. Mallick,et al.  Generalized Linear Models : A Bayesian Perspective , 2000 .

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[16]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[17]  Yann Guermeur,et al.  Combining Discriminant Models with New Multi-Class SVMs , 2002, Pattern Analysis & Applications.

[18]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[19]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[20]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[21]  B. Mallick,et al.  Bayesian classification of tumours by using gene expression data , 2005 .

[22]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.