An improved GLMNET for l1-regularized logistic regression

GLMNET proposed by Friedman et al. is an algorithm for generalized linear models with elastic net. It has been widely applied to solve L1-regularized logistic regression. However, recent experiments indicated that the existing GLMNET implementation may not be stable for large-scale problems. In this paper, we propose an improved GLMNET to address some theoretical and implementation issues. In particular, as a Newton-type method, GLMNET achieves fast local convergence, but may fail to quickly obtain a useful solution. By a careful design to adjust the effort for each iteration, our method is efficient regardless of loosely or strictly solving the optimization problem. Experiments demonstrate that the improved GLMNET is more efficient than a state-of-the-art coordinate descent method.

[1]  J. Dunn Newton's method and the goldstein step length rule for constrained minimization , 1980, 1980 19th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[2]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[3]  Nicol N. Schraudolph,et al.  A Fast, Compact Approximation of the Exponential Function , 1999, Neural Computation.

[4]  Chih-Jen Lin,et al.  Newton's Method for Large Bound-Constrained Optimization Problems , 1999, SIAM J. Optim..

[5]  Gavin C. Cawley,et al.  On a Fast, Compact Approximation of the Exponential Function , 2000, Neural Computation.

[6]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[7]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[8]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[9]  Honglak Lee,et al.  Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[10]  BoydStephen,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007 .

[11]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[12]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[13]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[14]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.

[15]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[18]  Cho-Jui Hsieh,et al.  Coordinate Descent Method for Large-scale L 2-loss Linear SVM , 2008 .

[19]  Chih-Jen Lin,et al.  Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL/IJCNLP.

[20]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[21]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[22]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[23]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[24]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[25]  Shou-De Lin,et al.  Feature Engineering and Classifier Ensemble for KDD Cup 2010 , 2010, KDD 2010.

[26]  Wotao Yin,et al.  A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[27]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[28]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[29]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[30]  Chih-Jen Lin,et al.  Iterative Scaling and Coordinate Descent Methods for Maximum Entropy , 2009, ACL.

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Inderjit S. Dhillon,et al.  Fast coordinate descent methods with variable selection for non-negative matrix factorization , 2011, KDD.

[33]  Kim-Chuan Toh,et al.  A coordinate gradient descent method for ℓ1-regularized convex minimization , 2011, Comput. Optim. Appl..

[34]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[35]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36]  LinChih-Jen,et al.  An improved GLMNET for L1-regularized logistic regression , 2012 .