Regularizing lasso: A consistent variable selection method

LASSO for variable selection in linear regression has been studied by many authors. To achieve asymptotic selection consistency, it is well known that the LASSO method requires a strong irrepresentable condition. Even adding a thresholding step after LASSO is still too conservative, especially when the number of explanatory variables p is much larger than the number of observations n. Another well-known method, the sure independence screening (SIS), applies thresholding to an estimator of marginal covariate effect vector and, therefore, is not selection consistent unless the zero components of the marginal covariate effect vector are asymptotically the same as the zero components of the regression effect vector. Since the weakness of LASSO is caused by the fact that it utilizes the covariate sample covariance matrix that is not well behaved when p is larger than n, we propose a regularized LASSO (RLASSO) method for replacing the covariate sample covariance matrix in LASSO by a regularized estimator of covariate covariance matrix and adding a thresholding step. Using a regularized estimator of covariate covariance matrix, we can consistently estimate the regression effects and, hence, our method also extends and improves the SIS method that estimates marginal covariate effects. We establish selection consistency of RLASSO under conditions that the regression effect vector is sparse and the covariate covariance matrix or its inverse is sparse. Some simulation results for comparing variable selection performances of RLASSO and various other methods are presented. A data example is also provided.

[1]  Jiashun Jin,et al.  UPS delivers optimal phase diagram in high-dimensional variable selection , 2010, 1010.5028.

[2]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[3]  Weidong Liu,et al.  Adaptive Thresholding for Sparse Covariance Matrix Estimation , 2011, 1102.2237.

[4]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[5]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[6]  Pascal J. Goldschmidt-Clermont,et al.  Gene Expression Patterns in Peripheral Blood Correlate with the Extent of Coronary Artery Disease , 2009, PloS one.

[7]  R. Tibshirani,et al.  Covariance‐regularized regression and classification for high dimensional problems , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[8]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[9]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[10]  PECAM-1 is a critical mediator of atherosclerosis , 2008, Disease Models & Mechanisms.

[11]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[12]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[13]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  P. Whittle,et al.  Bounds for the Moments of Linear and Quadratic Forms in Independent Variables , 1960 .