Bayesian Classification With Gaussian Processes

We consider the problem of assigning an input vector to one of m classes by predicting P(c|x) for c=1,...,m. For a two-class problem, the probability of class one given x is estimated by /spl sigma/(y(x)), where /spl sigma/(y)=1/(1+e/sup -y/). A Gaussian process prior is placed on y(x), and is combined with the training data to obtain predictions for new x points. We provide a Bayesian treatment, integrating over uncertainty in y and in the parameters that control the Gaussian process prior the necessary integration over y is carried out using Laplace's approximation. The method is generalized to multiclass problems (m>2) using the softmax function. We demonstrate the effectiveness of the method on a number of datasets.

[1]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[2]  B. Silverman,et al.  Density Ratios, Empirical Likelihood and Cot Death , 1978 .

[3]  K. Mardia,et al.  Maximum likelihood estimation of models for residual covariance in spatial regression , 1984 .

[4]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[5]  S. Yakowitz,et al.  A comparison of kriging with nonparametric regression methods , 1985 .

[6]  B. Yandell,et al.  Automatic Smoothing of Regression Functions in Generalized Linear Models , 1986 .

[7]  S. Duane,et al.  Hybrid Monte Carlo , 1987 .

[8]  P. Diaconis Bayesian Numerical Analysis , 1988 .

[9]  G. Wahba Spline models for observational data , 1990 .

[10]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[11]  J. Skilling Physics and Probability: Bayesian Numerical Analysis , 1993 .

[12]  G. Wahba,et al.  Soft Classiication, A. K. A. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Analysis of Variance , 1993 .

[13]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[14]  Chong Gu,et al.  Soft Classification, a. k. a. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Ana , 1993 .

[15]  Brian D. Ripley,et al.  Statistical aspects of neural networks , 1993 .

[16]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[17]  Brian D. Ripley,et al.  Flexible Non-linear Approaches to Classification , 1994 .

[18]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[19]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[20]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[21]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[22]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[23]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[24]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[25]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[26]  M. Gibbs,et al.  Efficient implementation of gaussian processes , 1997 .

[27]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[28]  Christopher K. I. Williams Computation with Infinite Neural Networks , 1998, Neural Computation.

[29]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[30]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[31]  Klaus Ritter,et al.  Bayesian numerical analysis , 2000 .