Piecewise linear regularized solution paths

We consider the generic regularized optimization problem β(λ) = argminβ L(y, Xβ) + λJ(β). Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407-499] have shown that for the LASSO-that is, if L is squared error loss and J(β) = ∥β∥ 1 is the l 1 norm of β-the optimal coefficient path is piecewise linear, that is, ∂β(λ)/∂λ. is piecewise constant. We derive a general characterization of the properties of (loss L, penalty J) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer's locally adaptive regression splines.

[1]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[2]  G. Wahba Spline models for observational data , 1990 .

[3]  Pin T. Ng,et al.  Quantile smoothing splines , 1994 .

[4]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[5]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  S. Geer,et al.  Locally adaptive regression splines , 1997 .

[9]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[10]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[13]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[14]  P. Davies,et al.  Local Extremes, Runs, Strings and Multiresolution , 2001 .

[15]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[16]  T. Johansen,et al.  An algorithm for multi-parametric quadratic programming and explicit MPC solutions , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[17]  Jianqing Fan,et al.  Regularization of Wavelet Approximations , 2001 .

[18]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[19]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[20]  Alberto Bemporad,et al.  An algorithm for multi-parametric quadratic programming and explicit MPC solutions , 2003, Autom..

[21]  W. Wong,et al.  On ψ-Learning , 2003 .

[22]  Trevor Hastie,et al.  Discussion of Boosting Papers , 2003 .

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[25]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[26]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[27]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[28]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[29]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[30]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[31]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[32]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[33]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[34]  Gunnar Rätsch,et al.  Image reconstruction by linear programming , 2003, IEEE Transactions on Image Processing.

[35]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .