Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions

This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (1) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (2) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S$$^3$$3ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S$$^3$$3ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (3) We apply (2) to the special case of FCPSLR with minimax concave penalty and show that under the restricted eigenvalue condition, any S$$^3$$3ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S$$^3$$3ONC admits FPTAS.

[1]  Stephen J. Wright,et al.  Quadratic programming , 2015 .

[2]  Vsevolod I. Ivanov,et al.  Second-order optimality conditions for inequality constrained problems with locally Lipschitz data , 2010, Optim. Lett..

[3]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[4]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[5]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[6]  Xiaojun Chen,et al.  Complexity of unconstrained $$L_2-L_p$$ minimization , 2011, Math. Program..

[7]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[8]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity , 2011, Math. Program..

[9]  Lei Qi,et al.  Sparse High Dimensional Models in Economics. , 2011, Annual review of economics.

[10]  X. Huo,et al.  Complexity of penalized likelihood estimation , 2010 .

[11]  Stephen A. Vavasis,et al.  Quadratic Programming is in NP , 1990, Inf. Process. Lett..

[12]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[13]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[14]  Zizhuo Wang,et al.  Complexity of Unconstrained L2-Lp Minimization , 2011 .

[15]  Shuheng Zhou Restricted Eigenvalue Conditions on Subgaussian Random Matrices , 2009, 0912.4045.

[16]  Yu. S. Volkov,et al.  Norm estimates for the inverses of matrices of monotone type and totally positive matrices , 2009 .

[17]  Y. Ye,et al.  Lower Bound Theory of Nonzero Entries in Solutions of ℓ2-ℓp Minimization , 2010, SIAM J. Sci. Comput..

[18]  D. Bertsimas,et al.  Least quantile regression via modern optimization , 2013, 1310.8625.

[19]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[20]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[21]  H. Zou,et al.  STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.

[22]  Rudolf Ahlswede,et al.  Strong converse for identification via quantum channels , 2000, IEEE Trans. Inf. Theory.

[23]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[24]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[25]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[26]  Xiaojun Chen,et al.  Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization , 2015, Math. Program..

[27]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[28]  Yinyu Ye,et al.  On the complexity of approximating a KKT point of quadratic programming , 1998, Math. Program..

[29]  Andre Wibisono,et al.  Tight bounds on the infinity norm of inverses of symmetric diagonally dominant positive matrices , 2014 .

[30]  R. Adamczak,et al.  Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles , 2009, 0903.2323.

[31]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  Martin J. Wainwright,et al.  Restricted Eigenvalue Properties for Correlated Gaussian Designs , 2010, J. Mach. Learn. Res..

[34]  Wei Bian,et al.  Optimality Conditions and Complexity for Non-Lipschitz Constrained Optimization Problems , 2014 .

[35]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[36]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[37]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[38]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[39]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[40]  Runze Li,et al.  CALIBRATING NON-CONVEX PENALIZED REGRESSION IN ULTRA-HIGH DIMENSION. , 2013, Annals of statistics.

[41]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[42]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[43]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[44]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[45]  Hao Yin,et al.  Strong NP-Hardness Result for Regularized $L_q$-Minimization Problems with Concave Penalty Functions , 2015, ArXiv.

[46]  Runze Li,et al.  GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED NONCONVEX LEARNING. , 2016, Annals of statistics.

[47]  Sham M. Kakade,et al.  A tail inequality for quadratic forms of subgaussian random vectors , 2011, ArXiv.

[48]  Yinyu Ye,et al.  On affine scaling algorithms for nonconvex quadratic programming , 1992, Math. Program..

[49]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .