A Study on Truncated Newton Methods for Linear Classification

Truncated Newton (TN) methods have been a useful technique for large-scale optimization. Instead of obtaining the full Newton direction, a truncated method approximately solves the Newton equation with an inner conjugate gradient (CG) procedure (TNCG for the whole method). These methods have been employed to efficiently solve linear classification problems. However, even in this deeply studied field, various theoretical and numerical aspects were not completely explored. The first contribution of this work is to comprehensively study the global and local convergence when TNCG is applied to linear classification. Because of the lack of twice differentiability under some losses, many past works cannot be applied here. We prove various missing pieces of theory from scratch and clarify many proper references. The second contribution is to study the termination of the CG method. For the first time when TNCG is applied to linear classification, we show that the inner stopping condition strongly affects the convergence speed. We propose using a quadratic stopping criterion to achieve both robustness and efficiency. The third contribution is that of combining the study on inner stopping criteria with that of preconditioning. We discuss how convergence theory is affected by preconditioning and finally propose an effective preconditioned TNCG.

[1]  Gene H. Golub,et al.  A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations , 2007, Milestones in Matrix Computation.

[2]  Chih-Jen Lin,et al.  Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization , 2017, SDM.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  J. M. Martínez,et al.  Inexact Newton methods for solving nonsmooth equations , 1995 .

[5]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[6]  Juan Yin,et al.  A semismooth Newton method for support vector classification and regression , 2019, Comput. Optim. Appl..

[7]  Defeng Sun,et al.  A Quadratically Convergent Newton Method for Computing the Nearest Correlation Matrix , 2006, SIAM J. Matrix Anal. Appl..

[8]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[9]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[10]  Jong-Shi Pang,et al.  Nonsmooth Equations: Motivation and Algorithms , 1993, SIAM J. Optim..

[11]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[12]  Chia-Hua Ho,et al.  Recent Advances of Large-Scale Linear Classification , 2012, Proceedings of the IEEE.

[13]  Homer F. Walker,et al.  Choosing the Forcing Terms in an Inexact Newton Method , 1996, SIAM J. Sci. Comput..

[14]  Chih-Jen Lin,et al.  Preconditioned Conjugate Gradient Methods in Truncated Newton Frameworks for Large-scale Linear Classification , 2018, ACML.

[15]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[16]  Bo-Yu Chu,et al.  Warm Start for Parameter Selection of Linear Classifiers , 2015, KDD.

[17]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[18]  L. Botti,et al.  A choice of forcing terms in inexact Newton iterations with application to pseudo-transient continuation for incompressible fluid flow computations , 2015, Appl. Math. Comput..

[19]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[20]  R. Dembo,et al.  INEXACT NEWTON METHODS , 1982 .

[21]  S. Nash,et al.  Linear and Nonlinear Optimization , 2008 .

[22]  Liqun Qi,et al.  A nonsmooth version of Newton's method , 1993, Math. Program..

[23]  S. Nash Truncated-Newton methods , 1982 .

[24]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[25]  Chih-Jen Lin,et al.  A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification , 2017, ACML.

[26]  Hengbin An,et al.  A choice of forcing terms in inexact Newton method , 2007 .

[27]  S. Nash,et al.  Assessing a search direction within a truncated-newton method , 1990 .

[28]  S. Nash A survey of truncated-Newton methods , 2000 .

[29]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[30]  S. Nash Preconditioning of Truncated-Newton Methods , 1985 .

[31]  Francisco Facchinei,et al.  Minimization of SC1 functions and the Maratos effect , 1995, Oper. Res. Lett..

[32]  Liqun Qi,et al.  Convergence Analysis of Some Algorithms for Solving Nonsmooth Equations , 1993, Math. Oper. Res..

[33]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .