A control-theoretic perspective on optimal high-order optimization

<jats:p>We provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function <jats:inline-formula><jats:alternatives><jats:tex-math>$$\varPhi : {\mathbb {R}}^d \rightarrow {\mathbb {R}}$$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>Φ</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi>R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo>→</mml:mo> <mml:mi>R</mml:mi> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula> that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators <jats:inline-formula><jats:alternatives><jats:tex-math>$$\nabla \varPhi $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>∇</mml:mi> <mml:mi>Φ</mml:mi> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula> and <jats:inline-formula><jats:alternatives><jats:tex-math>$$\nabla ^2 \varPhi $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msup> <mml:mi>∇</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mi>Φ</mml:mi> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula> together with a feedback control law <jats:inline-formula><jats:alternatives><jats:tex-math>$$\lambda (\cdot )$$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>λ</mml:mi> <mml:mo>(</mml:mo> <mml:mo>·</mml:mo> <mml:mo>)</mml:mo> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula> satisfying the algebraic equation <jats:inline-formula><jats:alternatives><jats:tex-math>$$(\lambda (t))^p\Vert \nabla \varPhi (x(t))\Vert ^{p-1} = \theta $$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:msup> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>λ</mml:mi> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>t</mml:mi> <mml:mo>)</mml:mo> </mml:mrow> <mml:mo>)</mml:mo> </mml:mrow> <mml:mi>p</mml:mi> </mml:msup> <mml:msup> <mml:mrow> <mml:mo>‖</mml:mo> <mml:mi>∇</mml:mi> <mml:mi>Φ</mml:mi> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>x</mml:mi> <mml:mrow> <mml:mo>(</mml:mo> <mml:mi>t</mml:mi> <mml:mo>)</mml:mo> </mml:mrow> <mml:mo>)</mml:mo> </mml:mrow> <mml:mo>‖</mml:mo> </mml:mrow> <mml:mrow> <mml:mi>p</mml:mi> <mml:mo>-</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:msup> <mml:mo>=</mml:mo> <mml:mi>θ</mml:mi> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula> for some <jats:inline-formula><jats:alternatives><jats:tex-math>$$\theta \in (0, 1)$$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>θ</mml:mi> <mml:mo>∈</mml:mo> <mml:mo>(</mml:mo> <mml:mn>0</mml:mn> <mml:mo>,</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula>. Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is <jats:inline-formula><jats:alternatives><jats:tex-math>$$O(1/t^{(3p+1)/2})$$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:msup> <mml:mi>t</mml:mi> <mml:mrow> <mml:mo>(</mml:mo> <mml:mn>3</mml:mn> <mml:mi>p</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> <mml:mo>/</mml:mo> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula> in terms of objective function gap and <jats:inline-formula><jats:alternatives><jats:tex-math>$$O(1/t^{3p})$$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:msup> <mml:mi>t</mml:mi> <mml:mrow> <mml:mn>3</mml:mn> <mml:mi>p</mml:mi> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula> in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework of Monteiro and Svaiter (SIAM J Optim 23(2):1092–1125, 2013) and the other of which leads to a new optimal <jats:italic>p</jats:italic>-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of Monteiro and Svaiter (2013), it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the <jats:italic>p</jats:italic>-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of <jats:inline-formula><jats:alternatives><jats:tex-math>$$O(k^{-3p})$$</jats:tex-math><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>k</mml:mi> <mml:mrow> <mml:mo>-</mml:mo> <mml:mn>3</mml:mn> <mml:mi>p</mml:mi> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math></jats:alternatives></jats:inline-formula>, which complements the recent analysis in Gasnikov et al. (in: COLT, PMLR, pp 1374–1391, 2019), Jiang et al. (in: COLT, PMLR, pp 1799–1801, 2019) and Bubeck et al. (in: COLT, PMLR, pp 492–507, 2019).</jats:p>

[1]  Nesterov Yurii,et al.  Inexact accelerated high-order proximal-point methods , 2020, Mathematical Programming.

[2]  Wilson A. Sutherland,et al.  Introduction to Metric and Topological Spaces , 1975 .

[3]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[4]  H. Attouch,et al.  First-order inertial algorithms involving dry friction damping , 2021, Mathematical programming.

[5]  Ohad Shamir,et al.  Oracle complexity of second-order methods for smooth convex optimization , 2017, Mathematical Programming.

[6]  Yurii Nesterov,et al.  Accelerating the cubic regularization of Newton’s method on convex problems , 2005, Math. Program..

[7]  Yee Whye Teh,et al.  Hamiltonian Descent Methods , 2018, ArXiv.

[8]  Sen-Zhong Huang,et al.  Gradient Inequalities: With Applications to Asymptotic Behavior And Stability of Gradient-like Systems , 2006 .

[9]  Benar Fux Svaiter,et al.  Global Convergence of a Closed-Loop Regularized Newton Method for Solving Monotone Inclusions in Hilbert Spaces , 2013, J. Optim. Theory Appl..

[10]  J. Bolte,et al.  A second-order gradient-like dissipative dynamical system with Hessian-driven damping.: Application to optimization and mechanics , 2002 .

[11]  H. Attouch,et al.  Fast convex optimization via inertial dynamics combining viscous and Hessian-driven damping with time rescaling , 2020, Evolution Equations & Control Theory.

[12]  Yin Tat Lee,et al.  Near-optimal method for highly smooth convex optimization , 2018, COLT.

[13]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[14]  I. Bihari A generalization of a lemma of bellman and its application to uniqueness problems of differential equations , 1956 .

[15]  Y. Nesterov Inexact basic tensor methods for some classes of convex optimization problems , 2020 .

[16]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[17]  Michael I. Jordan,et al.  Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives , 2020, ArXiv.

[18]  Hedy Attouch,et al.  Newton-like Inertial Dynamics and Proximal Algorithms Governed by Maximally Monotone Operators , 2020, SIAM J. Optim..

[19]  Zaki Chbani,et al.  Fast Convergence of Dynamical ADMM via Time Scaling of Damped Inertial Dynamics , 2021, Journal of Optimization Theory and Applications.

[20]  Yurii Nesterov,et al.  Implementable tensor methods in unconstrained convex optimization , 2019, Mathematical Programming.

[21]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[22]  Juan Peypouquet,et al.  Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity , 2018, Math. Program..

[23]  Y. Nesterov,et al.  Tensor methods for finding approximate stationary points of convex functions , 2019, Optim. Methods Softw..

[24]  Hedy Attouch,et al.  Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity , 2017 .

[25]  J. Bolte,et al.  On damped second-order gradient systems , 2014, 1411.8005.

[26]  Alexandre d'Aspremont,et al.  Integration Methods and Optimization Algorithms , 2017, NIPS.

[27]  Hedy Attouch,et al.  Fast Proximal Methods via Time Scaling of Damped Inertial Dynamics , 2019, SIAM J. Optim..

[28]  H. Attouch,et al.  An Inertial Proximal Method for Maximal Monotone Operators via Discretization of a Nonlinear Oscillator with Damping , 2001 .

[29]  Jean-François Aujol,et al.  The Differential Inclusion Modeling FISTA Algorithm and Optimality of Convergence Rate in the Case b $\leq3$ , 2018, SIAM J. Optim..

[30]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[31]  F. Alvarez D.,et al.  A Dynamical System Associated with Newton's Method for Parametric Approximations of Convex Minimization Problems , 1998 .

[32]  H. Attouch,et al.  Convergence of damped inertial dynamics governed by regularized maximally monotone operators , 2018, Journal of Differential Equations.

[33]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[34]  Yurii Nesterov,et al.  Regularized Newton Methods for Minimizing Functions with Hölder Continuous Hessians , 2017, SIAM J. Optim..

[35]  Yi Ma,et al.  Towards Unified Acceleration of High-Order Algorithms under Hölder Continuity and Uniform Convexity , 2019, ArXiv.

[36]  H. Attouch,et al.  A Dynamical Approach to Convex Minimization Coupling Approximation with the Steepest Descent Method , 1996 .

[37]  A DYNAMICAL SYSTEM ASSOCIATED WITH NEWTON ’ S METHOD FOR PARAMETRIC APPROXIMATIONS OF CONVEX MINIMIZATION PROBLEMS , 2004 .

[38]  A. Antipin,et al.  MINIMIZATION OF CONVEX FUNCTIONS ON CONVEX SETS BY MEANS OF DIFFERENTIAL EQUATIONS , 2003 .

[39]  Othmane Sebbouh,et al.  Convergence Rates of Damped Inertial Dynamics under Geometric Conditions and Perturbations , 2020, SIAM J. Optim..

[40]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[41]  Brian Bullins,et al.  Highly smooth minimization of non-smooth problems , 2020, COLT.

[42]  Ashia C. Wilson,et al.  Accelerating Rescaled Gradient Descent , 2019, 1902.08825.

[43]  Radu Ioan Bot,et al.  Second Order Forward-Backward Dynamical Systems For Monotone Inclusion Problems , 2015, SIAM J. Control. Optim..

[44]  Benar Fux Svaiter,et al.  Newton-Like Dynamics and Forward-Backward Methods for Structured Monotone Inclusions in Hilbert Spaces , 2014, J. Optim. Theory Appl..

[45]  J. Lasalle Uniqueness Theorems and Successive Approximations , 1949 .

[46]  Convergence of Global and Bounded Solutions of a Second Order Gradient like System with Nonlinear Dissipation and Analytic Nonlinearity , 2008 .

[47]  E. Fašangová,et al.  Convergence to equilibrium for solutions of an abstract wave equation with general damping function , 2016 .

[48]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[49]  Michael I. Jordan,et al.  A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[50]  Samir Adly,et al.  Finite Convergence of Proximal-Gradient Inertial Algorithms Combining Dry Friction with Hessian-Driven Damping , 2020, SIAM J. Optim..

[51]  Michael I. Jordan,et al.  On Symplectic Optimization , 2018, 1802.03653.

[52]  Renato D. C. Monteiro,et al.  An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[53]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[54]  Bo Jiang,et al.  A Unified Adaptive Tensor Approximation Scheme to Accelerate Composite Convex Optimization , 2020, SIAM J. Optim..

[55]  Bin Hu,et al.  Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[56]  P. Dvurechensky,et al.  Tensor methods for strongly convex strongly concave saddle point problems and strongly monotone variational inequalities , 2020, Computer Research and Modeling.

[57]  R. Chill,et al.  Every ordinary differential equation with a strict Lyapunov function is a gradient system , 2012 .

[58]  Felipe Alvarez,et al.  On the Minimizing Property of a Second Order Dissipative System in Hilbert Spaces , 2000, SIAM J. Control. Optim..

[59]  H. Attouch,et al.  The Second-order in Time Continuous Newton Method , 2001 .

[60]  Andre Wibisono,et al.  Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions , 2019, NeurIPS.

[61]  Jelena Diakonikolas,et al.  The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..

[62]  Zaki Chbani,et al.  First-order optimization algorithms via inertial systems with Hessian driven damping , 2019, Mathematical Programming.

[63]  Paul-Emile Maingé First-Order Continuous Newton-like Systems for Monotone Inclusions , 2013, SIAM J. Control. Optim..

[64]  E. Coddington,et al.  Theory of Ordinary Differential Equations , 1955 .

[65]  H. Attouch,et al.  Convergence Rate of Proximal Inertial Algorithms Associated with Moreau Envelopes of Convex Functions , 2019, Splitting Algorithms, Modern Operator Theory, and Applications.

[66]  Y. Nesterov,et al.  Tensor Methods for Minimizing Functions with H\"{o}lder Continuous Higher-Order Derivatives , 2019 .

[67]  José Mario Martínez,et al.  On High-order Model Regularization for Constrained Optimization , 2017, SIAM J. Optim..

[68]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[69]  J. Bolte,et al.  Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity , 2009 .

[70]  Alejandro Ribeiro,et al.  Analysis of Optimization Algorithms via Integral Quadratic Constraints: Nonstrongly Convex Problems , 2017, SIAM J. Optim..

[71]  Michael I. Jordan,et al.  Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..

[72]  Michael I. Jordan,et al.  On dissipative symplectic integration with applications to gradient-based optimization , 2020 .

[73]  H. Attouch,et al.  Fast convex optimization via time scaling of damped inertial gradient dynamics , 2020 .

[74]  Zeyuan Allen-Zhu,et al.  How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD , 2018, NeurIPS.

[75]  Yurii Nesterov,et al.  Superfast Second-Order Methods for Unconstrained Convex Optimization , 2020, Journal of Optimization Theory and Applications.

[76]  Y. Nesterov,et al.  Tensor Methods for Minimizing Convex Functions with Hölder Continuous Higher-Order Derivatives , 2019, SIAM J. Optim..

[77]  Renato D. C. Monteiro,et al.  On the Complexity of the Hybrid Proximal Extragradient Method for the Iterates and the Ergodic Mean , 2010, SIAM J. Optim..

[78]  Richard Peng,et al.  Higher-Order Accelerated Methods for Faster Non-Smooth Optimization , 2019, ArXiv.

[79]  H. Attouch,et al.  Fast convex optimization via inertial dynamics with Hessian driven damping , 2016, Journal of Differential Equations.

[80]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[81]  A. Gasnikov,et al.  Near-Optimal Hyperfast Second-Order Method for convex optimization and its Sliding. , 2020, 2002.09050.

[82]  Daniel P. Robinson,et al.  Conformal symplectic and relativistic optimization , 2019, NeurIPS.

[83]  H. Attouch,et al.  Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α ≤ 3 , 2017, ESAIM: Control, Optimisation and Calculus of Variations.

[84]  Hedy Attouch,et al.  The Rate of Convergence of Nesterov's Accelerated Forward-Backward Method is Actually Faster Than 1/k2 , 2015, SIAM J. Optim..

[85]  Michael I. Jordan,et al.  Acceleration via Symplectic Discretization of High-Resolution Differential Equations , 2019, NeurIPS.

[86]  M. Marques Alves Variants of the A-HPE and large-step A-HPE algorithms for strongly convex problems with applications to accelerated high-order tensor methods , 2021 .

[87]  José Mario Martínez,et al.  Worst-case evaluation complexity for unconstrained nonlinear optimization using high-order regularized models , 2017, Math. Program..

[88]  C. Dossal,et al.  Nesterov's acceleration and Polyak's heavy ball method in continuous time: convergence rate analysis under geometric conditions and perturbations , 2019, 1907.02710.

[89]  Paul Tseng,et al.  Approximation accuracy, gradient methods, and error bound for structured convex optimization , 2010, Math. Program..

[90]  Ramzi May Asymptotic for a second order evolution equation with convex potential and vanishing damping term , 2015, 1509.05598.

[91]  M. Baes Estimate sequence methods: extensions and approximations , 2009 .

[92]  K. Kurdyka On gradients of functions definable in o-minimal structures , 1998 .

[93]  H. Attouch,et al.  Continuous Newton-like Inertial Dynamics for Monotone Inclusions , 2020, Set-Valued and Variational Analysis.

[94]  Yurii Nesterov,et al.  Accelerated Regularized Newton Methods for Minimizing Composite Convex Functions , 2019, SIAM J. Optim..

[95]  M. Solodov,et al.  A Hybrid Approximate Extragradient – Proximal Point Algorithm Using the Enlargement of a Maximal Monotone Operator , 1999 .

[96]  B. Svaiter,et al.  A dynamic approach to a proximal-Newton method for monotone inclusions in Hilbert spaces, with complexity O(1/n^2) , 2015, 1502.04286.

[97]  Brendan O'Donoghue,et al.  Hamiltonian descent for composite objectives , 2019, NeurIPS.

[98]  H. Attouch,et al.  A second-order differential system with hessian-driven damping; application to non-elastic shock laws , 2012 .

[99]  Nicholas I. M. Gould,et al.  Universal regularization methods - varying the power, the smoothness and the accuracy , 2018, 1811.07057.

[100]  Radu Ioan Bot,et al.  Tikhonov regularization of a second order dynamical system with Hessian driven damping , 2019, Math. Program..

[101]  Nicholas I. M. Gould,et al.  Second-Order Optimality and Beyond: Characterization and Evaluation Complexity in Convexly Constrained Nonlinear Optimization , 2018, Found. Comput. Math..

[102]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[103]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[104]  Benar Fux Svaiter,et al.  A Continuous Dynamical Newton-Like Approach to Solving Monotone Inclusions , 2011, SIAM J. Control. Optim..

[106]  Aryan Mokhtari,et al.  Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[107]  Kevin A. Lai,et al.  Higher-order methods for convex-concave min-max optimization and monotone variational inequalities , 2020, SIAM J. Optim..

[108]  H. Attouch,et al.  THE HEAVY BALL WITH FRICTION METHOD, I. THE CONTINUOUS DYNAMICAL SYSTEM: GLOBAL EXPLORATION OF THE LOCAL MINIMA OF A REAL-VALUED FUNCTION BY ASYMPTOTIC ANALYSIS OF A DISSIPATIVE DYNAMICAL SYSTEM , 2000 .

[109]  Hedy Attouch,et al.  Convergence of a relaxed inertial proximal algorithm for maximally monotone operators , 2019, Mathematical Programming.

[110]  José Mario Martínez,et al.  Evaluation Complexity for Nonlinear Constrained Optimization Using Unscaled KKT Conditions and High-Order Models , 2016, SIAM J. Optim..

[111]  Local convergence of tensor methods , 2019, 1912.02516.

[112]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[113]  Shuzhong Zhang,et al.  An Optimal High-Order Tensor Method for Convex Optimization , 2019, COLT.

[114]  K. Deimling Fixed Point Theory , 2008 .

[115]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .