Forward Regression for Ultra-High Dimensional Variable Screening

Motivated by the seminal theory of Sure Independence Screening (Fan and Lv 2008, SIS), we investigate here another popular and classical variable screening method, namely, forward regression (FR). Our theoretical analysis reveals that FR can identify all relevant predictors consistently, even if the predictor dimension is substantially larger than the sample size. In particular, if the dimension of the true model is finite, FR can discover all relevant predictors within a finite number of steps. To practically select the “best” candidate from the models generated by FR, the recently proposed BIC criterion of Chen and Chen (2008) can be used. The resulting model can then serve as an excellent starting point, from where many existing variable selection methods (e.g., SCAD and Adaptive LASSO) can be applied directly. FR’s outstanding finite sample performances are confirmed by extensive numerical studies.

[1]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[2]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[5]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[6]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[7]  H. Zou,et al.  Regression Shrinkage and Selection via the Elastic Net , with Applications to Microarrays , 2003 .

[8]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  M. Yuan,et al.  On the Nonnegative Garrote Estimator , 2005 .

[13]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[14]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[15]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[16]  Victoria Stodden,et al.  Breakdown Point of Model Selection When the Number of Variables Exceeds the Number of Observations , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[17]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[18]  G. Wahba,et al.  A NOTE ON THE LASSO AND RELATED PROCEDURES IN MODEL SELECTION , 2006 .

[19]  M. Yuan,et al.  On the non‐negative garrotte estimator , 2007 .

[20]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[21]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[22]  Yingcun Xia,et al.  Variable selection for the single‐index model , 2007 .

[23]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[24]  Bin Yu,et al.  On Model Selection Consistency of the Elastic Net When p >> n , 2008 .

[25]  C. Robert Discussion of "Sure independence screening for ultra-high dimensional feature space" by Fan and Lv. , 2008 .

[26]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[27]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[28]  A. Barron,et al.  Approximation and learning by greedy algorithms , 2008, 0803.1718.

[29]  Jeffrey S. Morris,et al.  Sure independence screening for ultrahigh dimensional feature space Discussion , 2008 .

[30]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[31]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[32]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[33]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[34]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.