Linearized Bregman for l 1-regularized Logistic Regression

Sparse logistic regression is an important linear classifier in statistical learning, providing an attractive route for feature selection. A popular approach is based on minimizing an l1-regularization term with a regularization parameter λ that affects the solution sparsity. To determine an appropriate value for the regularization parameter, one can apply the grid search method or the Bayesian approach. The grid search method requires constructing a regularization path, by solving a sequence of minimization problems with varying values of the regularization parameter, which is typically time consuming. In this paper, we introduce a fast procedure that generates a new regularization path without tuning the regularization parameter. We first derive the direct Bregman method by replacing the l1-norm by Bregman divergence, and contrast it with the grid search method. For faster path computation, we further derive the linearized Bregman algorithm, which is algebraically simple and computationally efficient. Finally we demonstrate some empirical results for the linearized Bregman algorithm on benchmark data and study feature selection as an inverse problem. Compared with the grid search method, the linearized Bregman algorithm generates a different regularization path with comparable classification performance, in a much more computationally efficient manner. Proceedings of the 30 th International Conference on Machine Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume 28. Copyright 2013 by the author(s).

[1]  Frank Nielsen,et al.  Shape Retrieval Using Hierarchical Total Bregman Soft Clustering , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Stephen J. Wright Accelerated Block-coordinate Relaxation for Regularized Optimization , 2012, SIAM J. Optim..

[3]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[4]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Thomas S. Huang,et al.  Bregman distance to L1 regularized logistic regression , 2008, 2008 19th International Conference on Pattern Recognition.

[7]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[8]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[9]  BoydStephen,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007 .

[10]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[11]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[12]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[13]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[14]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wotao Yin,et al.  An Iterative Regularization Method for Total Variation-Based Image Restoration , 2005, Multiscale Model. Simul..

[16]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[17]  Antonin Chambolle,et al.  A l1-Unified Variational Framework for Image Restoration , 2004, ECCV.

[18]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[19]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[20]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[21]  Joshua Goodman,et al.  Exponential Priors for Maximum Entropy Models , 2004, NAACL.

[22]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[23]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[25]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[26]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[27]  Andrew Y. Ng,et al.  On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[28]  Antonin Chambolle,et al.  Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage , 1998, IEEE Trans. Image Process..

[29]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[30]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[31]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[32]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .