论文信息 - Analysis of Sparse Bayesian Learning

Analysis of Sparse Bayesian Learning

The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyper-parameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model.

Michael E. Tipping | Anita C. Faul

[1] David J. C. MacKay,et al. Bayesian Interpolation , 1992, Neural Computation.

[2] B. Schölkopf,et al. Linear programs for automatic accuracy control in regression. , 1999 .

[3] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4] Michael E. Tipping. Sparse Kernel Principal Component Analysis , 2000, NIPS.

[5] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[6] Michael E. Tipping. The Relevance Vector Machine , 1999, NIPS.

[7] Yves Grandvalet. Least Absolute Shrinkage is Equivalent to Quadratic Penalization , 1998 .

[8] Christopher M. Bishop,et al. Variational Relevance Vector Machines , 2000, UAI.