Consistent bi-level variable selection via composite group bridge penalized regression

We study the composite group bridge penalized regression methods for conducting bilevel variable selection in high dimensional linear regression models with a diverging number of predictors. The proposed method combines the ideas of bridge regression (Huang et al., 2008a) and group bridge regression (Huang et al., 2009), to achieve variable selection consistency in both individual and group levels simultaneously, i.e., the important groups and the important individual variables within each group can both be correctly identified with probability approaching to one as the sample size increases to infinity. The method takes full advantage of the prior grouping information, and the established bi-level oracle properties ensure that the method is immune to possible group misidentification. A related adaptive group bridge estimator, which uses adaptive penalization for improving bi-level selection, is also investigated. Simulation studies show that the proposed methods have superior performance in comparison to many existing methods.

[1]  C. Mallows Some Comments on Cp , 2000, Technometrics.

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  P. Breheny Regularized methods for high-dimensional and bi-level variable selection , 2009 .

[4]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[5]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[6]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[7]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[8]  Jian Huang,et al.  Penalized methods for bi-level variable selection. , 2009, Statistics and its interface.

[9]  J. Horowitz,et al.  Asymptotic properties of bridge estimators in sparse high-dimensional regression models , 2008, 0804.0693.

[10]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[13]  Cun-Hui Zhang PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[14]  Cun-Hui Zhang,et al.  A group bridge approach for variable selection , 2009, Biometrika.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[17]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[20]  Hongzhe Li,et al.  Group SCAD regression analysis for microarray time course gene expression data , 2007, Bioinform..

[21]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[22]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[23]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[24]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[25]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[26]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[27]  Jian Huang,et al.  A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[28]  M. Stone Cross-validation and multinomial prediction , 1974 .

[29]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.