Bayes Machines for binary classification

In this work, we propose an approach to binary classification based on an extension of Bayes Point Machines. Particularly, we take into account the whole set of hypotheses that are consistent with the data (the so-called version space) and the intrinsic noise in class labeling. We follow a Bayesian approach and compute an approximate posterior distribution for the model parameters, which leads to a predictive distribution over unseen data. The most compelling feature of the proposed model is that it is able to learn the noise present in the data with no additional cost. All the computations are carried out by means of the approximate Bayesian inference algorithm Expectation Propagation. Experimental results indicate that the proposed approach outperforms Support Vector Machines over several of the classification problems studied and is competitive with other Bayesian classification algorithms based on Gaussian Processes.

[1]  John Shawe-Taylor,et al.  Direct Bayes Point Machines , 2000, ICML.

[2]  Lehel Csató,et al.  Sparse On-Line Gaussian Processes , 2002, Neural Computation.

[3]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[4]  Manfred Opper,et al.  Advances in large margin classifiers , 2000 .

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[7]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[8]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[9]  Colin Campbell,et al.  Bayes Point Machines , 2001, J. Mach. Learn. Res..

[10]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[11]  Carl E. Rasmussen,et al.  Assessing Approximate Inference for Binary Gaussian Process Classification , 2005, J. Mach. Learn. Res..

[12]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[13]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[14]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[15]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[16]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[18]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[19]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[20]  Gene H. Golub,et al.  Matrix computations , 1983 .

[21]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Manfred Opper,et al.  Sparse Representation for Gaussian Process Models , 2000, NIPS.

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[25]  T. Heskes,et al.  Expectation propagation for approximate inference in dynamic bayesian networks , 2002, UAI 2002.

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[28]  Adrian F. M. Smith,et al.  Bayesian Statistics 5. , 1998 .

[29]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .