A Theory of High-dimensional Sparse Estimation via Non-Convex Regularized Regression

Non-convex regularized regression improves the performance of high-dimensional sparse estimation. Compared with convex regularizers, one of the important improvements is the weaker requirement on design matrix or, rather, weaker estimation condition. Estimation condition is a core issue for high-dimensional sparse estimation. However, previous works demanded the same estimation conditions as the convex regularized regression, which cannot explain the superiority of non-convex regularizers from the view of estimation condition and limits the further applications of them. This paper fills the gap between theory and experience by proposing new sparse eigenvalue based estimation conditions. For a general family of regularizers, named \xi-sharp concave regularizers, our conditions are weaker than that the convex regularizers need. Moreover, consistent sparse estimations are available not only for the global solutions of regularized regression, but also for the so-called approximate global and approximate stationary (AGAS) solutions. Our results on AGAS solutions are useful for application since we show the robustness of the non-convex regularized regression to the inaccuracy of the solutions and give a theoretical guarantee for the numerical solutions. Also, we give a quality guarantee for any solution that is regarded as an approximate global solution and prove that the desired approximate stationary solutions can be obtained simply by coordinate descent methods. This paper provides a general theory to non-convex high-dimensional sparse estimation and can serve as a guideline for selecting regularizers and developing algorithms for non-convex regularized regression.

[1]  Emil Y. Sidky,et al.  Practical iterative image reconstruction in digital breast tomosynthesis by non-convex TpV optimization , 2008, SPIE Medical Imaging.

[2]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[3]  Donald Geman,et al.  Nonlinear image recovery with half-quadratic regularization , 1995, IEEE Trans. Image Process..

[4]  Xiaotong Shen,et al.  Journal of the American Statistical Association Likelihood-based Selection and Sharp Parameter Estimation Likelihood-based Selection and Sharp Parameter Estimation , 2022 .

[5]  Cun-Hui Zhang,et al.  Rate Minimaxity of the Lasso and Dantzig Selector for the lq Loss in lr Balls , 2010, J. Mach. Learn. Res..

[6]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[7]  Naoki Saito Superresolution of noisy band-limited data by data adaptive regularization and its application to seismic trace inversion , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Armando Manduca,et al.  Highly Undersampled Magnetic Resonance Image Reconstruction via Homotopic $\ell_{0}$ -Minimization , 2009, IEEE Transactions on Medical Imaging.

[9]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[10]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[11]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[12]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[13]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[14]  Rick Chartrand,et al.  Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data , 2009, 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[15]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[16]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[17]  Wei Pan,et al.  On constrained and regularized high-dimensional regression , 2013, Annals of the Institute of Statistical Mathematics.

[18]  Emmanuel J. Candès,et al.  A Probabilistic and RIPless Theory of Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[19]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[20]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[21]  Rémi Gribonval,et al.  Restricted Isometry Constants Where $\ell ^{p}$ Sparse Recovery Can Fail for $0≪ p \leq 1$ , 2009, IEEE Transactions on Information Theory.

[22]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[23]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[24]  Tong Zhang,et al.  Sparse Recovery With Orthogonal Matching Pursuit Under RIP , 2010, IEEE Transactions on Information Theory.

[25]  Changshui Zhang,et al.  High-dimensional Inference via Lipschitz Sparsity-Yielding Regularizers , 2013, AISTATS.

[26]  Tong Zhang Multi-stage Convex Relaxation for Feature Selection , 2011, 1106.0565.

[27]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[28]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[29]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[30]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[31]  Armando Manduca,et al.  Sparse MRI Reconstruction via Multiscale L0-Continuation , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[32]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[33]  Rahul Garg,et al.  Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property , 2009, ICML '09.

[34]  Xiaojun Chen,et al.  Complexity of unconstrained $$L_2-L_p$$ minimization , 2011, Math. Program..

[35]  Yoram Bresler,et al.  Globally convergent edge-preserving regularized reconstruction: an application to limited-angle tomography , 1998, IEEE Trans. Image Process..

[36]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[37]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[38]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[39]  Michael Elad,et al.  Sparse Representation for Color Image Restoration , 2008, IEEE Transactions on Image Processing.

[40]  J Trzasko,et al.  Nonconvex prior image constrained compressed sensing (NCPICCS): theory and simulations on perfusion CT. , 2011, Medical physics.

[41]  Armando Manduca,et al.  Relaxed Conditions for Sparse Signal Recovery With General Concave Priors , 2009, IEEE Transactions on Signal Processing.

[42]  Xiaochuan Pan,et al.  Enhanced imaging of microcalcifications in digital breast tomosynthesis through improved image-reconstruction algorithms. , 2009, Medical physics.

[43]  Tong Zhang,et al.  A General Framework of Dual Certificate Analysis for Structured Sparse Recovery Problems , 2012, 1201.3302.

[44]  V. Koltchinskii The Dantzig selector and sparsity oracle inequalities , 2009, 0909.0861.

[45]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[46]  Xiaochuan Pan,et al.  Image reconstruction from few views by non-convex optimization , 2007, 2007 IEEE Nuclear Science Symposium Conference Record.

[47]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[48]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[49]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[50]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[51]  S. Foucart,et al.  Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0 , 2009 .

[52]  Zhihua Zhang,et al.  A non-convex relaxation approach to sparse dictionary learning , 2011, CVPR 2011.

[53]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[54]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..