Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable. This class of problems includes many functions which are not strongly convex, such as logistic regression. Our linear convergence result is (i) affine-invariant, and holds even if an (ii) approximate Hessian is used, and if the subproblems are (iii) only solved approximately. Thus we theoretically demonstrate the superiority of Newton's method over first-order methods, which would only achieve a sublinear $O(1/t^2)$ rate under similar conditions.

[1]  A. A. Bennett Newton's Method in General Analysis. , 1916, Proceedings of the National Academy of Sciences of the United States of America.

[2]  L. Kantorovich,et al.  Functional analysis and applied mathematics , 1963 .

[3]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[4]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[5]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Boris Polyak Newton-Kantorovich Method and Its Global Convergence , 2006 .

[8]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[9]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[10]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[11]  Yurii Nesterov,et al.  Accelerating the cubic regularization of Newton’s method on convex problems , 2005, Math. Program..

[12]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[13]  Francis R. Bach,et al.  Self-concordant analysis for logistic regression , 2009, ArXiv.

[14]  Shie Mannor,et al.  Robust Regression and Lasso , 2008, IEEE Transactions on Information Theory.

[15]  Nicholas I. M. Gould,et al.  Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results , 2011, Math. Program..

[16]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[17]  Eric Moulines,et al.  Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.

[18]  Francis R. Bach,et al.  Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..

[19]  Asuman E. Ozdaglar,et al.  A globally convergent incremental Newton method , 2014, Math. Program..

[20]  Katya Scheinberg,et al.  Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[21]  Naman Agarwal,et al.  Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.

[22]  Michael I. Jordan,et al.  CoCoA: A General Framework for Communication-Efficient Distributed Optimization , 2016, J. Mach. Learn. Res..

[23]  Robert M. Gower,et al.  Randomized Quasi-Newton Updates Are Linearly Convergent Matrix Inversion Algorithms , 2016, SIAM J. Matrix Anal. Appl..

[24]  Aleksander Madry,et al.  Matrix Scaling and Balancing via Box Constrained Newton's Method and Interior Point Methods , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[25]  Daniel P. Robinson,et al.  A trust region algorithm with a worst-case iteration complexity of O(ϵ-3/2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2016, Mathematical Programming.

[26]  Matilde Gargiani Hessian-CoCoA : a general parallel and distributed framework for non-strongly convex regularizers , 2017 .

[27]  Naman Agarwal,et al.  Second-Order Stochastic Optimization for Machine Learning in Linear Time , 2016, J. Mach. Learn. Res..

[28]  Tengyu Ma,et al.  Finding approximate local minima faster than gradient descent , 2016, STOC.

[29]  Martin Jaggi,et al.  Adaptive balancing of gradient and update computation times using global geometry and approximate subproblems , 2018, AISTATS.

[30]  Stephen J. Wright,et al.  Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..

[31]  Robert M. Gower,et al.  Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization , 2018, NeurIPS.

[32]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[33]  Wenbo Gao,et al.  Quasi-Newton methods: superlinear convergence without line searches for self-concordant functions , 2016, Optim. Methods Softw..

[34]  Stephen J. Wright,et al.  Inexact Successive quadratic approximation for regularized optimization , 2018, Comput. Optim. Appl..

[35]  Quoc Tran-Dinh,et al.  Generalized self-concordant functions: a recipe for Newton-type methods , 2017, Mathematical Programming.