Efficient Forward Regression with Marginal Likelihood

We propose an efficient forward regression algorithm based on greedy optimization of marginal likelihood. It can be understood as a forward selection procedure which adds a new basis vector at each step with the largest increment to the marginal likelihood. The computational cost of our algorithm is linear in the number n of training examples and quadratic in the number k of selected basis vectors, i.e. O(nk 2 ). Moreover, our approach is only required to store a small fraction of all columns of the full design matrix. We compare our algorithm with the well-known Rel- evance Vector Machines (RVM) which also optimizes marginal likelihood iteratively. The results show that our algorithm can achieve comparable prediction accuracy but with significantly better scaling performance in terms of both computational cost and memory requirements.

[1]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[2]  V. Raykar,et al.  Fast Computation of Sums of Gaussians in High Dimensions , 2005 .

[3]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[4]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[6]  Pascal Vincent,et al.  Kernel Matching Pursuit , 2002, Machine Learning.

[7]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[8]  Andy J. Keane,et al.  Some Greedy Learning Algorithms for Sparse Regression and Classification with Mercer Kernels , 2003, J. Mach. Learn. Res..

[9]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[10]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[11]  Carl E. Rasmussen,et al.  Healing the relevance vector machine through augmentation , 2005, ICML.

[12]  Michael E. Tipping Bayesian Inference: An Introduction to Principles and Practice in Machine Learning , 2003, Advanced Lectures on Machine Learning.

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  K Warwick,et al.  A robust nonlinear identification algorithm using PRESS statistic and forward regression , 2003, IEEE Trans. Neural Networks.