On the Suboptimality of Proximal Gradient Descent for $\ell^{0}$ Sparse Approximation

We study the proximal gradient descent (PGD) method for $\ell^{0}$ sparse approximation problem as well as its accelerated optimization with randomized algorithms in this paper. We first offer theoretical analysis of PGD showing the bounded gap between the sub-optimal solution by PGD and the globally optimal solution for the $\ell^{0}$ sparse approximation problem under conditions weaker than Restricted Isometry Property widely used in compressive sensing literature. Moreover, we propose randomized algorithms to accelerate the optimization by PGD using randomized low rank matrix approximation (PGD-RMA) and randomized dimension reduction (PGD-RDR). Our randomized algorithms substantially reduces the computation cost of the original PGD for the $\ell^{0}$ sparse approximation problem, and the resultant sub-optimal solution still enjoys provable suboptimality, namely, the sub-optimal solution to the reduced problem still has bounded gap to the globally optimal solution to the original problem.

[1]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[3]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[4]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[5]  S. Muthukrishnan,et al.  Relative-Error CUR Matrix Decompositions , 2007, SIAM J. Matrix Anal. Appl..

[6]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[7]  Zuowei Shen,et al.  L0 Norm Based Dictionary Learning by Proximal Methods with Global Convergence , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[9]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[10]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[11]  Prateek Jain,et al.  On Iterative Hard Thresholding Methods for High-dimensional M-Estimation , 2014, NIPS.

[12]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[13]  Rong Jin,et al.  Sparse Learning for Large-Scale and High-Dimensional Data: A Randomized Convex-Concave Optimization Approach , 2015, ALT.

[14]  Kaushik Mahata,et al.  An approximate L0 norm minimization algorithm for compressed sensing , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[16]  Sanjoy Dasgupta,et al.  An elementary proof of a theorem of Johnson and Lindenstrauss , 2003, Random Struct. Algorithms.

[17]  Piotr Indyk,et al.  Approximate Nearest Neighbors in Limited Space , 2018, COLT.

[18]  Rong Jin,et al.  Accelerated Sparse Linear Regression via Random Projection , 2016, AAAI.

[19]  Michael Elad,et al.  Why Simple Shrinkage Is Still Relevant for Redundant Representations? , 2006, IEEE Transactions on Information Theory.

[20]  Alan M. Frieze,et al.  Clustering Large Graphs via the Singular Value Decomposition , 2004, Machine Learning.

[21]  Dirk A. Lorenz,et al.  Iterated Hard Shrinkage for Minimization Problems with Sparsity Constraints , 2008, SIAM J. Sci. Comput..

[22]  Javier Portilla,et al.  L0-Norm-Based Sparse Representation Through Alternate Projections , 2006, 2006 International Conference on Image Processing.

[23]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[24]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[25]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[26]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[27]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory B.