Discretization Drift in Two-Player Games

Gradient-based methods for two-player games produce rich dynamics that can solve challenging problems, yet can be difficult to stabilize and understand. Part of this complexity originates from the discrete update steps given by simultaneous or alternating gradient descent, which causes each player to drift away from the continuous gradient flow — a phenomenon we call discretization drift. Using backward error analysis, we derive modified continuous dynamical systems that closely follow the discrete dynamics. These modified dynamics provide an insight into the notorious challenges associated with zero-sum games, including Generative Adversarial Networks. In particular, we identify distinct components of the discretization drift that can alter performance and in some cases destabilize the game. Finally, quantifying discretization drift allows us to identify regularizers that explicitly cancel harmful forms of drift or strengthen beneficial forms of drift, and thus improve performance of GAN training.

[1]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[2]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[3]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[4]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[5]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[6]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[7]  Daniel P. Robinson,et al.  Conformal symplectic and relativistic optimization , 2019, NeurIPS.

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  David G.T. Barrett,et al.  Implicit Gradient Regularization , 2020, ArXiv.

[10]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[11]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Laurence Aitchison,et al.  Gradient Regularization as Approximate Variational Inference , 2020, Entropy.

[13]  Richard Socher,et al.  Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization , 2020, ICML.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[16]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[17]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[18]  Anima Anandkumar,et al.  Implicit competitive regularization in GANs , 2020, ICML.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Ernst Hairer,et al.  The life-span of backward error analysis for numerical integrators , 1997 .

[21]  Florian Schäfer,et al.  Competitive Gradient Descent , 2019, NeurIPS.

[22]  David Duvenaud,et al.  Neural Ordinary Differential Equations , 2018, NeurIPS.

[23]  Michael I. Jordan,et al.  On dissipative symplectic integration with applications to gradient-based optimization , 2020 .

[24]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[25]  Guodong Zhang,et al.  On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach , 2019, ICLR.

[26]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[27]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[28]  Soham De,et al.  On the Origin of Implicit Regularization in Stochastic Gradient Descent , 2021, ICLR.

[29]  James Demmel,et al.  ImageNet Training in Minutes , 2017, ICPP.

[30]  Vikash Kumar,et al.  A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.

[31]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[32]  Pushmeet Kohli,et al.  Training Generative Adversarial Networks by Solving Ordinary Differential Equations , 2020, NeurIPS.

[33]  Haihao Lu,et al.  An O(sr)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(s^r)$$\end{document}-resolution ODE framework for understand , 2020, Mathematical Programming.

[34]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[35]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[36]  E Weinan,et al.  Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.

[37]  E. Hairer,et al.  Geometric Numerical Integration , 2022, Oberwolfach Reports.