Sparse linear models: Variational approximate inference and Bayesian experimental design

A wide range of problems such as signal reconstruction, denoising, source separation, feature selection, and graphical model search are addressed today by posterior maximization for linear models with sparsity-favouring prior distributions. The Bayesian posterior contains useful information far beyond its mode, which can be used to drive methods for sampling optimization (active learning), feature relevance ranking, or hyperparameter estimation, if only this representation of uncertainty can be approximated in a tractable manner. In this paper, we review recent results for variational sparse inference, and show that they share underlying computational primitives. We discuss how sampling optimization can be implemented as sequential Bayesian experimental design. While there has been tremendous recent activity to develop sparse estimation, little attendance has been given to sparse approximate inference. In this paper, we argue that many problems in practice, such as compressive sensing for real-world image reconstruction, are served much better by proper uncertainty approximations than by ever more aggressive sparse estimation algorithms. Moreover, since some variational inference methods have been given strong convex optimization characterizations recently, theoretical analysis may become possible, promising new insights into nonlinear experimental design.

[1]  C. Ross Found , 1869, The Dental register.

[2]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[3]  P. Lauterbur,et al.  Image Formation by Induced Local Interactions: Examples Employing Nuclear Magnetic Resonance , 1973, Nature.

[4]  P. Mansfield,et al.  Image formation in NMR by a selective irradiative process , 1974 .

[5]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[6]  Tod S. Levitt,et al.  Uncertainty in artificial intelligence , 1988 .

[7]  L. N. Kanal,et al.  Uncertainty in Artificial Intelligence 5 , 1990 .

[8]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[9]  L D Cromwell,et al.  Filtering noise from images with wavelet transforms , 1991, Magnetic resonance in medicine.

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  M. Wickerhauser,et al.  Wavelet Applications in Signal and Image Processing III , 1994 .

[12]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Michael Unser,et al.  Wavelet applications in signal and image processing VII : 19-23 July 1999, Denver, Colorado , 1999 .

[15]  Eero P. Simoncelli Modeling the joint statistics of images in the wavelet domain , 1999, Optics & Photonics.

[16]  Harold J. Kushner,et al.  A nonlinear filtering algorithm based on an approximation of the conditional distribution , 2000, IEEE Trans. Autom. Control..

[17]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[18]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[19]  Michael K. Schneider,et al.  Krylov Subspace Estimation , 2000, SIAM J. Sci. Comput..

[20]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[21]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[22]  T. Minka Power EP , 2004 .

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[25]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[26]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[27]  Dmitry M. Malioutov,et al.  Low-Rank Variance Approximation in GMRF Models: Single and Multiscale Approaches , 2008, IEEE Transactions on Signal Processing.

[28]  Constantine Bekas,et al.  Computation of Large Invariant Subspaces Using Polynomial Filtered Lanczos Iterations with Applications in Density Functional Theory , 2008, SIAM J. Matrix Anal. Appl..

[29]  Matthias W. Seeger,et al.  Large Scale Variational Inference and Experimental Design for Sparse Generalized Linear Models , 2008, Sampling-based Optimization in the Presence of Uncertainty.

[30]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[31]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.