Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization

Hard Thresholding Pursuit (HTP) is an iterative greedy selection procedure for finding sparse solutions of underdetermined linear systems. This method has been shown to have strong theoretical guarantee and impressive numerical performance. In this paper, we generalize HTP from compressive sensing to a generic problem setup of sparsity-constrained convex optimization. The proposed algorithm iterates between a standard gradient descent step and a hard thresholding step with or without debiasing. We prove that our method enjoys the strong guarantees analogous to HTP in terms of rate of convergence and parameter estimation accuracy. Numerical evidences show that our method is superior to the state-of-the-art greedy selection methods in sparse logistic regression and sparse precision matrix estimation tasks.

[1]  Yongdai Kim,et al.  Gradient LASSO for feature selection , 2004, ICML.

[2]  Martin J. Wainwright,et al.  Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[3]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[4]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[5]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[6]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[7]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.

[8]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Greedy Methods , 2011, NIPS.

[9]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[10]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[11]  Shuicheng Yan,et al.  Forward Basis Selection for Pursuing Sparse Representations over a Dictionary , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Simon Foucart,et al.  Hard Thresholding Pursuit: An Algorithm for Compressive Sensing , 2011, SIAM J. Numer. Anal..

[13]  Thomas Blumensath,et al.  Compressed Sensing With Nonlinear Observations and Related Nonlinear Optimization Problems , 2012, IEEE Transactions on Information Theory.

[14]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[15]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[16]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[17]  Zhaosong Lu,et al.  Smooth Optimization Approach for Sparse Covariance Selection , 2008, SIAM J. Optim..

[18]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[19]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[20]  D. Edwards Introduction to graphical modelling , 1995 .

[21]  Bhiksha Raj,et al.  Greedy sparsity-constrained optimization , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[22]  Zongming Ma Sparse Principal Component Analysis and Iterative Thresholding , 2011, 1112.2432.

[23]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[24]  Pradeep Ravikumar,et al.  Greedy Algorithms for Structurally Constrained High Dimensional Problems , 2011, NIPS.

[25]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[26]  Tong Zhang,et al.  Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints , 2010, SIAM J. Optim..

[27]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[28]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[29]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[30]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[31]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[32]  Shuicheng Yan,et al.  Forward Basis Selection for Sparse Approximation over Dictionary , 2012, AISTATS.

[33]  Martin Jaggi,et al.  Sparse Convex Optimization Methods for Machine Learning , 2011 .

[34]  S. Foucart Sparse Recovery Algorithms: Sufficient Conditions in Terms of RestrictedIsometry Constants , 2012 .

[35]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[36]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[37]  J. Ross,et al.  Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[38]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[39]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[40]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..