Path Following and Empirical Bayes Model Selection for Sparse Regression

In recent years, a rich variety of regularization procedures have been proposed for high dimensional regression problems. However, tuning parameter choice and computational efficiency in ultra-high dimensional problems remain vexing issues. The routine use of $\ell_1$ regularization is largely attributable to the computational efficiency of the LARS algorithm, but similar efficiency for better behaved penalties has remained elusive. In this article, we propose a highly efficient path following procedure for combination of any convex loss function and a broad class of penalties. From a Bayesian perspective, this algorithm rapidly yields maximum a posteriori estimates at different hyper-parameter values. To bypass the inefficiency and potential instability of cross validation, we propose an empirical Bayes procedure for rapidly choosing the optimal model and corresponding hyper-parameter value. This approach applies to any penalty that corresponds to a proper prior distribution on the regression coefficients. While we mainly focus on sparse estimation of generalized linear models, the method extends to more general regularizations such as polynomial trend filtering after reparameterization. The proposed algorithm scales efficiently to large $p$ and/or $n$. Solution paths of 10,000 dimensional examples are computed within one minute on a laptop for various generalized linear models (GLM). Operating characteristics are assessed through simulation studies and the methods are applied to several real data sets.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[3]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[4]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[5]  Christian P. Robert Numerical Analysis for Statisticians, Second Edition by Kenneth Lange , 2011 .

[6]  M. Yuan,et al.  Efficient Empirical Bayes Variable Selection and Estimation in Linear Models , 2005 .

[7]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[8]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[9]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[10]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[11]  Hua Zhou,et al.  A Generic Path Algorithm for Regularized Statistical Estimation , 2012, Journal of the American Statistical Association.

[12]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[13]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[14]  D. Donoho,et al.  Atomic Decomposition by Basis Pursuit , 2001 .

[15]  Artin Armagan,et al.  Variational Bridge Regression , 2009, AISTATS.

[16]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[17]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[18]  Stephen P. Boyd,et al.  Enhancing Sparsity by Reweighted ℓ1 Minimization , 2007, 0711.1612.

[19]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[20]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[21]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[22]  Yichao Wu,et al.  An ordinary differential equation-based solution path algorithm , 2011, Journal of nonparametric statistics.

[23]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[24]  Jaeyong Lee,et al.  GENERALIZED DOUBLE PARETO SHRINKAGE. , 2011, Statistica Sinica.

[25]  H. Akaike A new look at the statistical model identification , 1974 .

[26]  Ying Xiong Nonlinear Optimization , 2014 .

[27]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[28]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[29]  K. Lange,et al.  A Path Algorithm for Constrained Estimation , 2011, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[32]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[33]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[34]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[35]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[36]  A. Ruszczynski,et al.  Nonlinear Optimization , 2006 .

[37]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[38]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.