The Trimmed Lasso: Sparse Recovery Guarantees and Practical Optimization by the Generalized Soft-Min Penalty

We present a new approach to solve the sparse approximation or best subset selection problem, namely find a $k$-sparse vector ${\bf x}\in\mathbb{R}^d$ that minimizes the $\ell_2$ residual $\lVert A{\bf x}-{\bf y} \rVert_2$. We consider a regularized approach, whereby this residual is penalized by the non-convex $\textit{trimmed lasso}$, defined as the $\ell_1$-norm of ${\bf x}$ excluding its $k$ largest-magnitude entries. We prove that the trimmed lasso has several appealing theoretical properties, and in particular derive sparse recovery guarantees assuming successful optimization of the penalized objective. Next, we show empirically that directly optimizing this objective can be quite challenging. Instead, we propose a surrogate for the trimmed lasso, called the $\textit{generalized soft-min}$. This penalty smoothly interpolates between the classical lasso and the trimmed lasso, while taking into account all possible $k$-sparse patterns. The generalized soft-min penalty involves summation over $\binom{d}{k}$ terms, yet we derive a polynomial-time algorithm to compute it. This, in turn, yields a practical method for the original sparse approximation problem. Via simulations, we demonstrate its competitive performance compared to current state of the art.

[1]  Zhifeng Zhang,et al.  Adaptive time-frequency decompositions , 1994 .

[2]  Yufeng Liu,et al.  Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[3]  Robert D. Nowak,et al.  Majorization–Minimization Algorithms for Wavelet-Based Image Restoration , 2007, IEEE Transactions on Image Processing.

[4]  L. Rebollo-Neira,et al.  Optimized orthogonal matching pursuit approach , 2002, IEEE Signal Processing Letters.

[5]  R. DeVore,et al.  Compressed sensing and best k-term approximation , 2008 .

[6]  Zeyuan Allen Zhu,et al.  Restricted Isometry Property for General p-Norms , 2016, IEEE Trans. Inf. Theory.

[7]  Eunho Yang,et al.  Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning , 2019, ICML.

[8]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[9]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[10]  Umeshwar Dayal,et al.  K-Harmonic Means - A Data Clustering Algorithm , 1999 .

[11]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[12]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[13]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[14]  Yadolah Dodge,et al.  Mathematical Programming In Statistics , 1981 .

[15]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[16]  Emmanuel J. Candès,et al.  Quantitative Robust Uncertainty Principles and Optimally Sparse Decompositions , 2004, Found. Comput. Math..

[17]  Guillermo Sapiro,et al.  Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[18]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[19]  Trevor Hastie,et al.  Statistical Learning with Sparsity: The Lasso and Generalizations , 2015 .

[20]  S. Geer,et al.  Correlated variables in regression: Clustering and sparse estimation , 2012, 1209.5908.

[21]  Ivan W. Selesnick,et al.  Sparse Signal Approximation via Nonseparable Regularization , 2017, IEEE Transactions on Signal Processing.

[22]  Davies Rémi Gribonval Restricted Isometry Constants Where Lp Sparse Recovery Can Fail for 0 , 2008 .

[23]  Dimitris Bertsimas,et al.  Algorithm for cardinality-constrained quadratic optimization , 2009, Comput. Optim. Appl..

[24]  Mário A. T. Figueiredo,et al.  Decreasing Weighted Sorted ${\ell_1}$ Regularization , 2014, IEEE Signal Processing Letters.

[25]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[26]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[27]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[28]  Wotao Yin,et al.  Bregman Iterative Algorithms for (cid:2) 1 -Minimization with Applications to Compressed Sensing ∗ , 2008 .

[29]  Rahul Mazumder,et al.  The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization , 2015, IEEE Transactions on Information Theory.

[30]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[31]  Christian Jutten,et al.  A Fast Approach for Overcomplete Sparse Decomposition Based on Smoothed $\ell ^{0}$ Norm , 2008, IEEE Transactions on Signal Processing.

[32]  Deanna Needell,et al.  Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit , 2007, IEEE Journal of Selected Topics in Signal Processing.

[33]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[34]  David R. Kincaid,et al.  Numerical analysis: mathematics of scientific computing (2nd ed) , 1996 .

[35]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[36]  Mário A. T. Figueiredo,et al.  Decreasing Weighted Sorted ℓ1 Regularization , 2014, ArXiv.

[37]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[38]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[39]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[40]  Holger Rauhut,et al.  On the Gap Between Restricted Isometry Properties and Sparse Recovery Conditions , 2018, IEEE Transactions on Information Theory.

[41]  Sabine Van Huffel,et al.  Two-level ℓ1 minimization for compressed sensing , 2015, Signal Process..

[42]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[43]  L. Watson,et al.  Modern homotopy methods in optimization , 1989 .

[44]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[45]  Jared Tanner,et al.  Explorer Compressed Sensing : How Sharp Is the Restricted Isometry Property ? , 2011 .

[46]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[47]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[48]  Piotr Indyk,et al.  Combining geometry and combinatorics: A unified approach to sparse signal recovery , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[49]  Rémi Gribonval,et al.  Restricted Isometry Constants Where $\ell ^{p}$ Sparse Recovery Can Fail for $0≪ p \leq 1$ , 2009, IEEE Transactions on Information Theory.

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  JuttenChristian,et al.  A fast approach for overcomplete sparse decomposition based on smoothed l0 norm , 2009 .

[52]  Marc E. Pfetsch,et al.  The Computational Complexity of the Restricted Isometry Property, the Nullspace Property, and Related Concepts in Compressed Sensing , 2012, IEEE Transactions on Information Theory.

[53]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[54]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[55]  R. Chartrand,et al.  Restricted isometry properties and nonconvex compressive sensing , 2007 .

[56]  Holger Rauhut,et al.  A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.

[57]  Lei Qi,et al.  Sparse High Dimensional Models in Economics. , 2011, Annual review of economics.

[58]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[59]  Wotao Yin,et al.  Improved Iteratively Reweighted Least Squares for Unconstrained Smoothed 퓁q Minimization , 2013, SIAM J. Numer. Anal..

[60]  J. Tropp,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, Commun. ACM.

[61]  Weijie J. Su,et al.  SLOPE-ADAPTIVE VARIABLE SELECTION VIA CONVEX OPTIMIZATION. , 2014, The annals of applied statistics.

[62]  S. Foucart,et al.  Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0 , 2009 .

[63]  Dimitri P. Bertsekas,et al.  Convex Analysis and Optimization , 2003 .

[64]  Johan Löfberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004 .

[65]  Thomas S. Huang,et al.  A fast orthogonal matching pursuit algorithm , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[66]  Justin K. Romberg,et al.  Fast and Accurate Algorithms for Re-Weighted $\ell _{1}$-Norm Minimization , 2012, IEEE Transactions on Signal Processing.

[67]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[68]  Dimitris Bertsimas,et al.  The Trimmed Lasso: Sparsity and Robustness , 2017, 1708.04527.

[69]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[70]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[71]  Daniel Bienstock,et al.  Computational Study of a Family of Mixed-Integer Quadratic Programming Problems , 1995, IPCO.

[72]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[73]  Hussein Hazimeh,et al.  Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms , 2018, Oper. Res..

[74]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[75]  Wotao Yin,et al.  Sparse Signal Reconstruction via Iterative Support Detection , 2009, SIAM J. Imaging Sci..

[76]  Akiko Takeda,et al.  Efficient DC Algorithm for Constrained Sparse Optimization , 2017, 1701.08498.

[77]  Robert D. Nowak,et al.  A bound optimization approach to wavelet-based image deconvolution , 2005, IEEE International Conference on Image Processing 2005.

[78]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[79]  Akiko Takeda,et al.  DC formulations and algorithms for sparse optimization problems , 2017, Mathematical Programming.

[80]  Rachel Ward,et al.  Compressed Sensing With Cross Validation , 2008, IEEE Transactions on Information Theory.

[81]  I. Daubechies,et al.  Sparse and stable Markowitz portfolios , 2007, Proceedings of the National Academy of Sciences.

[82]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[83]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[84]  Frédéric Lesage,et al.  The Application of Compressed Sensing for Photo-Acoustic Tomography , 2009, IEEE Transactions on Medical Imaging.

[85]  David P. Wipf,et al.  Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions , 2010, IEEE J. Sel. Top. Signal Process..

[86]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[87]  Andrew Zisserman,et al.  Smooth Loss Functions for Deep Top-k Classification , 2018, ICLR.

[88]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[89]  Gitta Kutyniok,et al.  1 . 2 Sparsity : A Reasonable Assumption ? , 2012 .

[90]  Alan J. Miller Subset Selection in Regression , 1992 .

[91]  Bart P. G. Van Parys,et al.  Sparse high-dimensional regression: Exact scalable algorithms and phase transitions , 2017, The Annals of Statistics.