Extended Linear Models with Gaussian Prior on the Parameters and Adaptive Expansion Vectors

We present an approximate Bayesian method for regression and classification with models linear in the parameters. Similar to the Relevance Vector Machine (RVM), each parameter is associated with an expansion vector. Unlike the RVM, the number of expansion vectors is specified beforehand. We assume an overall Gaussian prior on the parameters and find, with a gradient based process, the expansion vectors that (locally) maximize the evidence. This approach has lower computational demands than the RVM, and has the advantage that the vectors do not necessarily belong to the training set. Therefore, in principle, better vectors can be found. Furthermore, other hyperparameters can be learned in the same smooth joint optimization. Experimental results show that the freedom of the expansion vectors to be located away from the training data causes overfitting problems. These problems are alleviated by including a hyperprior that penalizes expansion vectors located far away from the input data.

[1]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[2]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[3]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[4]  M. Seeger Expectation Propagation for Exponential Families , 2005 .

[5]  Yuan Qi,et al.  Predictive automatic relevance determination by expectation propagation , 2004, ICML.

[6]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[7]  Joaquin Quiñonero Candela,et al.  Incremental Gaussian Processes , 2002, NIPS.

[8]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[9]  Gm Gero Walter,et al.  Bayesian linear regression , 2009 .

[10]  Bernhard Schölkopf,et al.  Building Sparse Large Margin Classifiers , 2005, ICML.

[11]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[12]  Thomas G. Dietterich,et al.  Improving the Performance of Radial Basis Function Networks by Learning Center Locations , 1991, NIPS.

[13]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[14]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[15]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[16]  Stefan Schaal,et al.  The Bayesian backfitting relevance vector machine , 2004, ICML.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.