L 1-regularization path algorithm for generalized linear models

We introduce a path following algorithm for L1-regularized generalized linear models. The L1-regularization procedure is useful especially because it, in effect, selects variables according to the amount of penalization on the L1-norm of the coefficients, in a manner that is less greedy than forward selection–backward deletion. The generalized linear model path algorithm efficiently computes solutions along the entire regularization path by using the predictor–corrector method of convex optimization. Selecting the step length of the regularization parameter is critical in controlling the overall accuracy of the paths; we suggest intuitive and flexible strategies for choosing appropriate values. We demonstrate the implementation with several simulated and real data sets.

[1]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[2]  J. Crowley,et al.  Covariance Analysis of Heart Transplant Survival Data , 1977 .

[3]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[4]  R. Kellogg,et al.  Pathways to solutions, fixed points, and equilibria , 1983 .

[5]  E. Allgower,et al.  Numerical Continuation Methods , 1990 .

[6]  Robert Tibshirani,et al.  An Introduction to the Bootstrap CHAPMAN & HALL/CRC , 1993 .

[7]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[12]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[13]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[14]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[15]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[17]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[18]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[19]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[20]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[21]  P. Zhao Boosted Lasso , 2004 .

[22]  Saharon Rosset,et al.  Tracking Curved Regularized Optimization Solution Paths , 2004, NIPS 2004.

[23]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[24]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[25]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[26]  M. Marias Analysis on Manifolds , 2005 .

[27]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[28]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[29]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[30]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[31]  D.,et al.  Regression Models and Life-Tables , 2022 .