ET-Lasso: Efficient Tuning of Lasso for High-Dimensional Data

The L1 regularization (Lasso) has proven to be a versatile tool to select relevant features and estimate the model coefficients simultaneously. Despite its popularity, it is very challenging to guarantee the feature selection consistency of Lasso. One way to improve the feature selection consistency is to select an ideal tuning parameter. Traditional tuning criteria mainly focus on minimizing the estimated prediction error or maximizing the posterior model probability, such as cross-validation and BIC, which may either be time-consuming or fail to control the false discovery rate (FDR) when the number of features is extremely large. The other way is to introduce pseudo-features to learn the importance of the original ones. Recently, the Knockoff filter is proposed to control the FDR when performing feature selection. However, its performance is sensitive to the choice of the expected FDR threshold. Motivated by these ideas, we propose a new method using pseudo-features to obtain an ideal tuning parameter. In particular, we present the Efficient Tuning of Lasso (ET-Lasso) to separate active and inactive features by adding permuted features as pseudo-features in linear models. The pseudo-features are constructed to be inactive by nature, which can be used to obtain a cutoff to select the tuning parameter that separates active and inactive features. Experimental studies on both simulations and real-world data applications are provided to show that ET-Lasso can effectively and efficiently select active features under a wide range of different scenarios.

[1]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[2]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[3]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[4]  Xiaohui Luo,et al.  Tuning Variable Selection Procedures by Adding Noise , 2006, Technometrics.

[5]  Joshua Zhexue Huang,et al.  Unbiased Feature Selection in Learning Random Forests for High-Dimensional Data , 2015, TheScientificWorldJournal.

[6]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[7]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[8]  Hansheng Wang Forward Regression for Ultra-High Dimensional Variable Screening , 2009 .

[9]  Yang Feng,et al.  Modified Cross-Validation for Penalized High-Dimensional Linear Regression Models , 2013, 1309.2068.

[10]  Jean-Jacques Fuchs,et al.  Recovery of exact sparse representations in the presence of bounded noise , 2005, IEEE Transactions on Information Theory.

[11]  Galen Reeves,et al.  Approximate Sparsity Pattern Recovery: Information-Theoretic Lower Bounds , 2010, IEEE Transactions on Information Theory.

[12]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[15]  S. Shalev-Shwartz,et al.  Stochastic methods for {\it l}$_{\mbox{1}}$ regularized loss minimization , 2009, ICML 2009.

[16]  Wieslaw Paja,et al.  All Relevant Feature Selection Methods and Applications , 2015, Feature Selection for Data and Pattern Recognition.

[17]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  M. Stone Cross-validation and multinomial prediction , 1974 .

[20]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[21]  Lucas Janson,et al.  Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection , 2016, 1610.02351.

[22]  L. Stefanski,et al.  Controlling Variable Selection by the Addition of Pseudovariables , 2007 .

[23]  Runze Li,et al.  Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[24]  Francis R. Bach,et al.  Bolasso: model consistent Lasso estimation through the bootstrap , 2008, ICML '08.

[25]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[26]  Bin Yu,et al.  Estimation Stability With Cross-Validation (ESCV) , 2013, 1303.3128.

[27]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[28]  Chenlei Leng,et al.  Shrinkage tuning parameter selection with a diverging number of parameters , 2008 .

[29]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[30]  Shuheng Zhou,et al.  Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , 2009, NIPS.

[31]  Marc Teboulle,et al.  Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems , 2009, IEEE Transactions on Image Processing.

[32]  H. Akaike A new look at the statistical model identification , 1974 .

[33]  G. Schwarz Estimating the Dimension of a Model , 1978 .