Optimizing Deep Neural Networks via Discretization of Finite-Time Convergent Flows

In this paper, we investigate in the context of deep neural networks, the performance of several discretization algorithms for two first-order finite-time optimization flows. These flows are, namely, the rescaled-gradient flow (RGF) and the signed-gradient flow (SGF), and consist of non-Lipscthiz or discontinuous dynamical systems that converge locally in finite time to the minima of gradient-dominated functions. We introduce three discretization methods for these first-order finite-time flows, and provide convergence guarantees. We then apply the proposed algorithms in training neural networks and empirically test their performances on three standard datasets, namely, CIFAR10, SVHN, and MNIST. Our results show that our schemes demonstrate faster convergences against standard optimization alternatives, while achieving equivalent or better accuracy.

[1]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[2]  C. Botsaris Differential gradient methods , 1978 .

[3]  M. Bartholomew-Biggs,et al.  Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations , 1989 .

[4]  Antonio Orvieto,et al.  Shadowing Properties of Optimization Algorithms , 2019, NeurIPS.

[5]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[6]  Michael I. Jordan,et al.  A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Aleksej F. Filippov,et al.  Differential Equations with Discontinuous Righthand Sides , 1988, Mathematics and Its Applications.

[9]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[10]  C. Botsaris,et al.  A class of methods for unconstrained minimization based on stable numerical integration techniques , 1978 .

[11]  The use of differential equations in optimization , 1981 .

[12]  Alexandre d'Aspremont,et al.  Integration Methods and Optimization Algorithms , 2017, NIPS.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Sanjay P. Bhat,et al.  Semistability, Finite-Time Stability, Differential Inclusions, and Discontinuous Dynamical Systems Having a Continuum of Equilibria , 2009, IEEE Transactions on Automatic Control.

[15]  Daniel P. Robinson,et al.  A Dynamical Systems Perspective on Nonsmooth Constrained Optimization , 2018 .

[16]  Jorge Cortés,et al.  Finite-time convergent gradient flows with applications to network consensus , 2006, Autom..

[17]  Daniel P. Robinson,et al.  Conformal symplectic and relativistic optimization , 2019, NeurIPS.

[18]  Francesco Bullo,et al.  Coordination and Geometric Optimization via Distributed Dynamical Systems , 2003, SIAM J. Control. Optim..

[19]  Ashia Wilson,et al.  Lyapunov Arguments in Optimization , 2018 .

[20]  J. Snyman A new and dynamic method for unconstrained minimization , 1982 .

[21]  F. Clarke Generalized gradients of Lipschitz functionals , 1981 .

[22]  Alejandro Ribeiro,et al.  Analysis of Optimization Algorithms via Integral Quadratic Constraints: Nonstrongly Convex Problems , 2017, SIAM J. Optim..

[23]  Manfred Morari,et al.  Design of First-Order Optimization Algorithms via Sum-of-Squares Programming , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[24]  A. Bacciotti,et al.  Stability and Stabilization of Discontinuous Systems and Nonsmooth Lyapunov Functions , 1999 .

[25]  Andre Wibisono,et al.  Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions , 2019, NeurIPS.

[26]  Jan A. Snyman An improved version of the original leap-frog dynamic method for unconstrained minimization: LFOP1(b) , 1983 .

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  U. Helmke,et al.  Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.

[29]  Ricardo G. Sanfelice,et al.  Dynamical properties of hybrid systems simulators , 2010, Autom..

[30]  Shankar Sastry,et al.  A calculus for computing Filippov's differential inclusion with application to the variable structure control of robot manipulators , 1986, 1986 25th IEEE Conference on Decision and Control.

[31]  Mouhacine Benosman,et al.  Finite-Time Convergence in Continuous-Time Optimization , 2020, ICML.

[32]  Sérgio Pequito,et al.  Convergence of the Expectation-Maximization Algorithm Through Discrete-Time Lyapunov Stability Theory , 2018, 2019 American Control Conference (ACC).

[33]  J. Cortés Discontinuous dynamical systems , 2008, IEEE Control Systems.

[34]  Aryan Mokhtari,et al.  Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[35]  Daniel P. Robinson,et al.  ADMM and Accelerated ADMM as Continuous Dynamical Systems , 2018, ICML.

[36]  Alejandro Ribeiro,et al.  A variational approach to dual methods for constrained convex optimization , 2017, 2017 American Control Conference (ACC).

[37]  A. Bountis Dynamical Systems And Numerical Analysis , 1997, IEEE Computational Science and Engineering.

[38]  Johannes Schropp,et al.  Using dynamical systems methods to solve minimization problems , 1995 .

[39]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[40]  Bryan Van Scoy,et al.  Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees , 2018, ICML.

[41]  R. Brockett,et al.  Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[42]  Jing Wang,et al.  A control perspective for centralized and distributed convex optimization , 2011, IEEE Conference on Decision and Control and European Control Conference.

[43]  J. Schropp,et al.  A dynamical systems approach to constrained minimization , 2000 .