Sparse Approximation via Penalty Decomposition Methods

In this paper we consider sparse approximation problems, that is, general $l_0$ minimization problems with the $l_0$-"norm" of a vector being a part of constraints or objective function. In particular, we first study the first-order optimality conditions for these problems. We then propose penalty decomposition (PD) methods for solving them in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD methods satisfies the first-order optimality conditions of the problems. Furthermore, for the problems in which the $l_0$ part is the only nonconvex part, we show that such an accumulation point is a local minimizer of the problems. In addition, we show that any accumulation point of the sequence generated by the BCD method is a saddle point of the penalty subproblem. Moreover, for the problems in which the $l_0$ part is the only nonconvex part, we establish that such an accumulation point is a local minimizer of the penalty subproblem. Finally, we test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  J. Claerbout,et al.  Robust Modeling With Erratic Data , 1973 .

[3]  H. L. Taylor,et al.  Deconvolution with the l 1 norm , 1979 .

[4]  S. Levy,et al.  Reconstruction of a sparse spike train from a portion of its spectrum and application to high-resolution deconvolution , 1981 .

[5]  Gene H. Golub,et al.  Matrix computations , 1983 .

[6]  Alan J. Miller Subset Selection in Regression , 1992 .

[7]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Jeff A. Bilmes,et al.  Factored sparse inverse covariance matrices , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[14]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[15]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[16]  M. West,et al.  Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[18]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[19]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[20]  Lucas C. Parra,et al.  Recipes for the linear analysis of EEG , 2005, NeuroImage.

[21]  Lucas C. Parra,et al.  Cortical origins of response time variability during rapid discrimination of visual objects , 2005, NeuroImage.

[22]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[23]  Mee Young Park,et al.  Regularization Path Algorithms for Detecting Gene Interactions , 2006 .

[24]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[25]  P. Sajda,et al.  Temporal characterization of the neural correlates of perceptual decision making in the human brain. , 2006, Cerebral cortex.

[26]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[27]  Joel A. Tropp,et al.  Sparse Approximation Via Iterative Thresholding , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[28]  Sophia Ananiadou,et al.  Learning string similarity measures for gene/protein name dictionary look-up using logistic regression , 2007, Bioinform..

[29]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[30]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[31]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[32]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[33]  Rick Chartrand,et al.  Exact Reconstruction of Sparse Signals via Nonconvex Minimization , 2007, IEEE Signal Processing Letters.

[34]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Maximum Likelihood Estimation , 2007, ArXiv.

[35]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[36]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[37]  T. Blumensath,et al.  Iterative Thresholding for Sparse Approximations , 2008 .

[38]  Mike E. Davies,et al.  Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[39]  Vwani P. Roychowdhury,et al.  Covariance selection for nonchordal graphs via chordal embedding , 2008, Optim. Methods Softw..

[40]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[41]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[42]  A. Dobra Dependency networks for genome-wide data , 2008 .

[43]  Rayan Saab,et al.  Algorithm 890: Sparco: A Testing Framework for Sparse Reconstruction , 2009, TOMS.

[44]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[45]  O. SIAMJ.,et al.  SMOOTH OPTIMIZATION APPROACH FOR SPARSE COVARIANCE SELECTION∗ , 2009 .

[46]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[47]  Y. Ye,et al.  Lower Bound Theory of Nonzero Entries in Solutions of ℓ2-ℓp Minimization , 2010, SIAM J. Sci. Comput..

[48]  Zhaosong Lu,et al.  Adaptive First-Order Methods for General Sparse Inverse Covariance Selection , 2009, SIAM J. Matrix Anal. Appl..

[49]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[50]  Huan Liu Feature Selection , 2010, Encyclopedia of Machine Learning.

[51]  Kim-Chuan Toh,et al.  Solving Log-Determinant Optimization Problems by a Newton-CG Primal Proximal Point Algorithm , 2010, SIAM J. Optim..

[52]  Lu Li,et al.  An inexact interior point method for L1-regularized sparse covariance selection , 2010, Math. Program. Comput..

[53]  Wotao Yin,et al.  FIXED-POINT CONTINUATION APPLIED TO COMPRESSED SENSING: IMPLEMENTATION AND NUMERICAL EXPERIMENTS * , 2010 .

[54]  Xiaojun Chen,et al.  Convergence of Reweighted ' 1 Minimization Algorithms and Unique Solution of Truncated ' p Minimization , 2010 .

[55]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[56]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[57]  Bin Dong,et al.  ℓ0 Minimization for wavelet frame based image restoration , 2011, Math. Comput..

[58]  Ying Xiong Nonlinear Optimization , 2014 .

[59]  Zhaosong Lu,et al.  Penalty decomposition methods for rank minimization , 2010, Optim. Methods Softw..