VARIABLE SELECTION AND ESTIMATION WITH THE SEAMLESS-L0 PENALTY

Penalized least squares procedures that directly penalize the number of variables in a regression model (L0 penalized least squares procedures) enjoy nice theoretical properties and are intuitively appealing. On the other hand, L0 penalized least squares methods also have significant drawbacks in that implementation is NP-hard and not computationally feasible when the number of variables is even moderately large. One of the challenges is the discontinuity of the L0 penalty. We propose the seamless-L0 (SELO) penalty, a smooth function on [0,∞) that very closely resembles the L0 penalty. The SELO penalized least squares procedure is shown to consistently select the correct model and is asymptotically normal, provided the number of variables grows more slowly than the number of observations. SELO is efficiently implemented using a coordinate descent algorithm. Since tuning parameter selection is crucial to the performance of the SELO procedure, we propose a BIC-like tuning parameter selection method for SELO, and show that it consistently identifies the correct model while allowing the number of variables to diverge. Simulation results show that the SELO procedure with BIC tuning parameter selection performs well in a variety of settings – outperforming other popular penalized least squares procedures by a substantial margin. Using SELO, we analyze a publicly available HIV drug resistance and mutation dataset and obtain interpretable results.

[1]  S. Dharmadhikari,et al.  Bounds on the Moments of Martingales , 1968 .

[2]  H. Akaike A new look at the statistical model identification , 1974 .

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[5]  Dean P. Foster,et al.  The risk inflation criterion for multiple regression , 1994 .

[6]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[9]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[10]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[11]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[12]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[13]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[14]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[15]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[16]  R. Shafer,et al.  Genotypic predictors of human immunodeficiency virus type 1 drug resistance , 2006, Proceedings of the National Academy of Sciences.

[17]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[18]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[19]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[20]  Chenlei Leng,et al.  Shrinkage tuning parameter selection with a diverging number of parameters , 2008 .

[21]  Yongdai Kim,et al.  Smoothly Clipped Absolute Deviation on High Dimensions , 2008 .

[22]  Huldrych F Günthard,et al.  Update of the drug resistance mutations in HIV-1: Spring 2008. , 2008, Topics in HIV medicine : a publication of the International AIDS Society, USA.

[23]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[24]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[25]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[26]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.