Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions

A variety of practical methods have recently been introduced for finding maximally sparse representations from overcomplete dictionaries, a central computational task in compressive sensing applications as well as numerous others. Many of the underlying algorithms rely on iterative reweighting schemes that produce more focal estimates as optimization progresses. Two such variants are iterative reweighted l1 and l2 minimization; however, some properties related to convergence and sparse estimation, as well as possible generalizations, are still not clearly understood or fully exploited. In this paper, we make the distinction between separable and non-separable iterative reweighting algorithms. The vast majority of existing methods are separable, meaning the weighting of a given coefficient at each iteration is only a function of that individual coefficient from the previous iteration (as opposed to dependency on all coefficients). We examine two such separable reweighting schemes: an l2 method from Chartrand and Yin (2008) and an l1 approach from Cande's (2008), elaborating on convergence results and explicit connections between them. We then explore an interesting non-separable alternative that can be implemented via either l2 or l1 reweighting and maintains several desirable properties relevant to sparse recovery despite a highly non-convex underlying cost function. For example, in the context of canonical sparse estimation problems, we prove uniform superiority of this method over the minimum l1 solution in that, 1) it can never do worse when implemented with reweighted l1, and 2) for any dictionary and sparsity profile, there will always exist cases where it does better. These results challenge the prevailing reliance on strictly convex (and separable) penalty functions for finding sparse solutions. We then derive a new non-separable variant with similar properties that exhibits further performance improvements in empirical tests. Finally, we address natural extensions to group sparsity problems and non-negative sparse coding.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Jean-Jacques Fuchs,et al.  On sparse representations in arbitrary redundant bases , 2004, IEEE Transactions on Information Theory.

[3]  Bhaskar D. Rao,et al.  Subset selection in noise based on diversity measure minimization , 2003, IEEE Trans. Signal Process..

[4]  Rémi Gribonval,et al.  Sparse representations in unions of bases , 2003, IEEE Trans. Inf. Theory.

[5]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[6]  Joel A. Tropp,et al.  ALGORITHMS FOR SIMULTANEOUS SPARSE APPROXIMATION , 2006 .

[7]  Bhaskar D. Rao,et al.  An affine scaling methodology for best basis selection , 1999, IEEE Trans. Signal Process..

[8]  A. Bruckstein,et al.  A Non-Negative and Sparse Enough Solution of an Underdetermined Linear System of Equations is Unique , 2007 .

[9]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[10]  K. Kreutz-Delgado,et al.  Basis selection in the presence of noise , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[11]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  David P. Wipf,et al.  Bayesian methods for finding sparse representations , 2006 .

[13]  Michael Elad,et al.  On the Uniqueness of Nonnegative Sparse Solutions to Underdetermined Systems of Equations , 2008, IEEE Transactions on Information Theory.

[14]  J. Tropp Algorithms for simultaneous sparse approximation. Part II: Convex relaxation , 2006, Signal Process..

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[17]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[18]  Rémi Gribonval,et al.  Restricted Isometry Constants Where $\ell ^{p}$ Sparse Recovery Can Fail for $0≪ p \leq 1$ , 2009, IEEE Transactions on Information Theory.

[19]  Dmitry M. Malioutov,et al.  Optimal sparse representations in general overcomplete bases , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Jorge S. Marques,et al.  Selecting Landmark Points for Sparse Manifold Learning , 2005, NIPS.

[21]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[22]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[23]  R. Showalter Monotone operators in Banach space and nonlinear partial differential equations , 1996 .

[24]  Richard G. Baraniuk,et al.  Recovery of Jointly Sparse Signals from Few Random Projections , 2005, NIPS.

[25]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[26]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[27]  Davies Rémi Gribonval Restricted Isometry Constants Where Lp Sparse Recovery Can Fail for 0 , 2008 .

[28]  Julia P. Owen,et al.  Robust Bayesian estimation of the location, orientation, and time course of multiple correlated neural sources using MEG , 2010, NeuroImage.

[29]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[30]  Bhaskar D. Rao,et al.  Variational EM Algorithms for Non-Gaussian Latent Variable Models , 2005, NIPS.

[31]  Stephen P. Boyd,et al.  Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices , 2003, Proceedings of the 2003 American Control Conference, 2003..

[32]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Robert D. Nowak,et al.  Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[34]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .