A New Regularization Path for Logistic Regression via Linearized Bregman

Sparse logistic regression is an important linear classifier in statistical learning, providing an attractive route for feature selection. A popular approach is based on minimizing an l1-regularization term with a regularization parameter λ that affects the solution sparsity. To determine an appropriate value for the regularization parameter, one can apply the grid search method or the Bayesian approach. The grid search method requires constructing a regularization path, by solving a sequence of minimization problems with varying values of the regularization parameter, which is typically time consuming. In this paper, we introduce a fast procedure that generates a new regularization path without tuning the regularization parameter. We first derive the direct Bregman method by replacing the l1-norm by Bregman divergence, and contrast it with the grid search method. For faster path computation, we further derive the linearized Bregman algorithm, which is algebraically simple and computationally efficient. Finally we demonstrate some empirical results for the linearized Bregman algorithm on benchmark data and study feature selection as an inverse problem. Compared with the grid search method, the linearized Bregman algorithm generates a different regularization path with comparable classification performance, in a much more computationally efficient manner. AMS classification scheme numbers: 65, 62, 35 Submitted to: Inverse Problems Linearized Bregman 2

[1]  Jian-Feng Cai,et al.  Linearized Bregman iterations for compressed sensing , 2009, Math. Comput..

[2]  Wotao Yin,et al.  Error Forgetting of Bregman Iteration , 2013, J. Sci. Comput..

[3]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[4]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[5]  Wotao Yin,et al.  An Iterative Regularization Method for Total Variation-Based Image Restoration , 2005, Multiscale Model. Simul..

[6]  Wotao Yin,et al.  Analysis and Generalizations of the Linearized Bregman Method , 2010, SIAM J. Imaging Sci..

[7]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[8]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[9]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[10]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[11]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[12]  Joshua Goodman,et al.  Exponential Priors for Maximum Entropy Models , 2004, NAACL.

[13]  Andrew Y. Ng,et al.  On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[14]  Antonin Chambolle,et al.  A l1-Unified Variational Framework for Image Restoration , 2004, ECCV.

[15]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[16]  Michael Möller,et al.  An adaptive inverse scale space method for compressed sensing , 2012, Math. Comput..

[17]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[18]  Antonin Chambolle,et al.  Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage , 1998, IEEE Trans. Image Process..

[19]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[20]  Stanley Osher,et al.  Iterative Regularization and Nonlinear Inverse Scale Space Applied to Wavelet-Based Denoising , 2007, IEEE Transactions on Image Processing.

[21]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[24]  Thomas S. Huang,et al.  Bregman distance to L1 regularized logistic regression , 2008, 2008 19th International Conference on Pattern Recognition.

[25]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[26]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[27]  Jian-Feng Cai,et al.  Convergence of the linearized Bregman iteration for ℓ1-norm minimization , 2009, Math. Comput..

[28]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[29]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[31]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.