Latent Variable Bayesian Models for Promoting Sparsity

Many practical methods for finding maximally sparse coefficient expansions involve solving a regression problem using a particular class of concave penalty functions. From a Bayesian perspective, this process is equivalent to maximum a posteriori (MAP) estimation using a sparsity-inducing prior distribution (Type I estimation). Using variational techniques, this distribution can always be conveniently expressed as a maximization over scaled Gaussian distributions modulated by a set of latent variables. Alternative Bayesian algorithms, which operate in latent variable space leveraging this variational representation, lead to sparse estimators reflecting posterior information beyond the mode (Type II estimation). Currently, it is unclear how the underlying cost functions of Type I and Type II relate, nor what relevant theoretical properties exist, especially with regard to Type II. Herein a common set of auxiliary functions is used to conveniently express both Type I and Type II cost functions in either coefficient or latent variable space facilitating direct comparisons. In coefficient space, the analysis reveals that Type II is exactly equivalent to performing standard MAP estimation using a particular class of dictionary- and noise-dependent, nonfactorial coefficient priors. One prior (at least) from this class maintains several desirable advantages over all possible Type I methods and utilizes a novel, nonconvex approximation to the l0 norm with most, and in certain quantifiable conditions all, local minima smoothed away. Importantly, the global minimum is always left unaltered unlike standard l1-norm relaxations. This ensures that any appropriate descent method is guaranteed to locate the maximally sparse solution.

[1]  Bhaskar D. Rao,et al.  Performance Analysis of Latent Variable Models with Sparse Priors , 2007 .

[2]  I F Gorodnitsky,et al.  Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm. , 1995, Electroencephalography and clinical neurophysiology.

[3]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[4]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[5]  Maneesh Sahani,et al.  Evidence Optimization Techniques for Estimating Stimulus-Response Functions , 2002, NIPS.

[6]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[7]  David P. Wipf,et al.  Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions , 2010, IEEE J. Sel. Top. Signal Process..

[8]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[9]  Bhaskar D. Rao,et al.  Subset selection in noise based on diversity measure minimization , 2003, IEEE Trans. Signal Process..

[10]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[11]  A. Bruckstein,et al.  A Non-Negative and Sparse Enough Solution of an Underdetermined Linear System of Equations is Unique , 2007 .

[12]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[13]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[15]  EnganKjersti,et al.  Dictionary learning algorithms for sparse representation , 2003 .

[16]  Bhaskar D. Rao,et al.  Sparse solutions to linear inverse problems with multiple measurement vectors , 2005, IEEE Transactions on Signal Processing.

[17]  David P. Wipf,et al.  Bayesian methods for finding sparse representations , 2006 .

[18]  Michael Elad,et al.  On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations , 2008, IEEE Transactions on Information Theory.

[19]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[20]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[21]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Davies Rémi Gribonval Restricted Isometry Constants Where Lp Sparse Recovery Can Fail for 0 , 2008 .

[23]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[24]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[25]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[26]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[27]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[28]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[29]  Mário A. T. Figueiredo Adaptive Sparseness Using Jeffreys Prior , 2001, NIPS.

[30]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[31]  Bhaskar D. Rao,et al.  Comparing the Effects of Different Weight Distributions on Finding Sparse Representations , 2005, NIPS.

[32]  Hong-Ye Gao,et al.  Wavelet Shrinkage Denoising Using the Non-Negative Garrote , 1998 .

[33]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[34]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[35]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[36]  J. Stephen Judd,et al.  Learning in neural networks , 1988, COLT '88.

[37]  Simon J. Godsill,et al.  Blind Separation of Sparse Sources Using Jeffrey's Inverse Prior and the EM Algorithm , 2006, ICA.

[38]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[39]  Rémi Gribonval,et al.  Restricted Isometry Constants Where $\ell ^{p}$ Sparse Recovery Can Fail for $0≪ p \leq 1$ , 2009, IEEE Transactions on Information Theory.

[40]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[41]  Dmitry M. Malioutov,et al.  Optimal sparse representations in general overcomplete bases , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.