Trust Region Newton Method for Logistic Regression

Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also extend the proposed method to large-scale L2-loss linear support vector machines (SVM).

[1]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[2]  John G. Lewis,et al.  Sparse matrix test problems , 1982, SGNM.

[3]  T. Steihaug The Conjugate Gradient Method and Trust Regions in Large Scale Optimization , 1983 .

[4]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[5]  Jorge Nocedal,et al.  A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization , 1991, SIAM J. Optim..

[6]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[7]  François-Xavier Le Dimet,et al.  Numerical Experience with Limited-Memory Quasi-Newton and Truncated Newton Methods , 1993, SIAM J. Optim..

[8]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Chih-Jen Lin,et al.  Newton's Method for Large Bound-Constrained Optimization Problems , 1999, SIAM J. Optim..

[10]  S. Nash A survey of truncated-Newton methods , 2000 .

[11]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[12]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[13]  Rong Yan,et al.  A Faster Iterative Scaling Algorithm for Conditional Exponential Model , 2003, ICML.

[14]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[15]  Chih-Jen Lin,et al.  Decomposition Methods for Linear Support Vector Machines , 2004, Neural Computation.

[16]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[17]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[18]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[19]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[20]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[21]  Alexander J. Smola,et al.  Bundle Methods for Machine Learning , 2007, NIPS.

[22]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[23]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[24]  David Madigan,et al.  Algorithms for Sparse Linear Classifiers in the Massive Data Setting , 2008 .