Combined l1 and greedy l0 penalized least squares for linear model selection

We introduce a computationally effective algorithm for a linear model selection consisting of three steps: screening-ordering-selection (SOS). Screening of predictors is based on the thresholded Lasso that is l1 penalized least squares. The screened predictors are then fitted using least squares (LS) and ordered with respect to their |t| statistics. Finally, a model is selected using greedy generalized information criterion (GIC) that is l0 penalized LS in a nested family induced by the ordering. We give non-asymptotic upper bounds on error probability of each step of the SOS algorithm in terms of both penalties. Then we obtain selection consistency for different (n, p) scenarios under conditions which are needed for screening consistency of the Lasso. Our error bounds and numerical experiments show that SOS is worth considering alternative for multi-stage convex relaxation, the latest quasiconvex penalized LS. For the traditional setting (n > p) we give Sanov-type bounds on the error probabilities of the ordering-selection algorithm. It is surprising consequence of our bounds that the selection error of greedy GIC is asymptotically not larger than of exhaustive GIC.

[1]  Ulrike Schneider,et al.  Distributional results for thresholding estimators in high-dimensional Gaussian regression models , 2011, 1106.6002.

[2]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[3]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Calyampudi R. Rao,et al.  A strongly consistent procedure for model selection in a regression problem , 1989 .

[6]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[7]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[8]  M. Wegkamp,et al.  Consistent variable selection in high dimensional regression via multiple testing , 2006 .

[9]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[10]  Zehua Chen,et al.  Extended BIC for linear regression models with diverging number of relevant features and high or ultra-high feature spaces , 2011 .

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[12]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[13]  Tadeusz Inglot,et al.  Asymptotic optimality of new adaptive test in regression model , 2006 .

[14]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[15]  G. Casella,et al.  Consistency of Bayesian procedures for variable selection , 2009, 0904.2978.

[16]  Tong Zhang Multi-stage Convex Relaxation for Feature Selection , 2011, 1106.0565.

[17]  Jun Shao Convergence rates of the generalized information criterion , 1998 .

[18]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[19]  Julien Mairal,et al.  Complexity Analysis of the Lasso Regularization Path , 2012, ICML.

[20]  W. Loh,et al.  Consistent Variable Selection in Linear Models , 1995 .

[21]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[22]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[23]  Shuheng Zhou,et al.  Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , 2009, NIPS.

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  A. Laforgia Further inequalities for the gamma function , 1984 .

[26]  S. Geer,et al.  The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) , 2011 .

[27]  Shuheng Zhou Thresholded Lasso for high dimensional variable selection and statistical estimation , 2010, 1002.1583.

[28]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[29]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[30]  Jian Huang,et al.  Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications , 2011, J. Mach. Learn. Res..

[31]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[32]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[33]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[34]  Zhaoran Wang,et al.  OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS. , 2013, Annals of statistics.

[35]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[36]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[37]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[38]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.