Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

We study the role that a finite timescale separation parameter $\tau$ has on gradient descent-ascent in two-player non-convex, non-concave zero-sum games where the learning rate of player 1 is denoted by $\gamma_1$ and the learning rate of player 2 is defined to be $\gamma_2=\tau\gamma_1$. Existing work analyzing the role of timescale separation in gradient descent-ascent has primarily focused on the edge cases of players sharing a learning rate ($\tau =1$) and the maximizing player approximately converging between each update of the minimizing player ($\tau \rightarrow \infty$). For the parameter choice of $\tau=1$, it is known that the learning dynamics are not guaranteed to converge to a game-theoretically meaningful equilibria in general. In contrast, Jin et al. (2020) showed that the stable critical points of gradient descent-ascent coincide with the set of strict local minmax equilibria as $\tau\rightarrow\infty$. In this work, we bridge the gap between past work by showing there exists a finite timescale separation parameter $\tau^{\ast}$ such that $x^{\ast}$ is a stable critical point of gradient descent-ascent for all $\tau \in (\tau^{\ast}, \infty)$ if and only if it is a strict local minmax equilibrium. Moreover, we provide an explicit construction for computing $\tau^{\ast}$ along with corresponding convergence rates and results under deterministic and stochastic gradient feedback. The convergence results we present are complemented by a non-convergence result: given a critical point $x^{\ast}$ that is not a strict local minmax equilibrium, then there exists a finite timescale separation $\tau_0$ such that $x^{\ast}$ is unstable for all $\tau\in (\tau_0, \infty)$. Finally, we empirically demonstrate on the CIFAR-10 and CelebA datasets the significant impact timescale separation has on training performance.

[1]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[2]  Thomas Hofmann,et al.  Local Saddle Point Optimization: A Curvature Exploitation Approach , 2018, AISTATS.

[3]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[4]  Lillian J. Ratliff,et al.  Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.

[5]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[6]  Roy M. Howard,et al.  Linear System Theory , 1992 .

[7]  Eyad H. Abed,et al.  Generalized Stability of Linear Singularly Perturbed Systems Including Calculation of Maximal Parameter Range , 1990 .

[8]  Constantinos Daskalakis,et al.  The complexity of constrained min-max optimization , 2020, STOC.

[9]  Meisam Razaviyayn,et al.  Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2021, SIAM J. Optim..

[10]  Lahcen Saydy New stability/performance results for singularly perturbed systems , 1996, Autom..

[11]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[12]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[13]  Willy Govaerts,et al.  Numerical methods for bifurcations of dynamical equilibria , 1987 .

[14]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[15]  C. Tretter Spectral Theory Of Block Operator Matrices And Applications , 2008 .

[16]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[17]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[18]  João Pedro Hespanha,et al.  Linear Systems Theory , 2009 .

[19]  ASHISH CHERUKURI,et al.  Saddle-Point Dynamics: Conditions for Asymptotic Stability of Saddle Points , 2015, SIAM J. Control. Optim..

[20]  David Duvenaud,et al.  Optimizing Millions of Hyperparameters by Implicit Differentiation , 2019, AISTATS.

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[23]  F. Takens,et al.  Preliminaries of Dynamical Systems Theory , 2010 .

[24]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[25]  S. Shankar Sastry,et al.  Characterization and computation of local Nash equilibria in continuous games , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[26]  Asuman Ozdaglar,et al.  Do GANs always have Nash equilibria? , 2020, ICML.

[27]  Thore Graepel,et al.  Differentiable Game Mechanics , 2019, J. Mach. Learn. Res..

[28]  S. Shankar Sastry,et al.  On the Characterization of Local Nash Equilibria in Continuous Games , 2014, IEEE Transactions on Automatic Control.

[29]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[30]  Eyad H. Abed,et al.  Guardian maps and the generalized stability of parametrized families of matrices and polynomials , 1990, Math. Control. Signals Syst..

[31]  I. Argyros A generalization of Ostrowski's theorem on fixed points , 1999 .

[32]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[33]  Francis J. Doyle,et al.  Nonlinear systems theory , 1997 .

[34]  Yongxin Chen,et al.  Hybrid Block Successive Approximation for One-Sided Non-Convex Min-Max Problems: Algorithms and Applications , 2019, IEEE Transactions on Signal Processing.

[35]  Victor R. Lesser,et al.  Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[36]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[37]  Roger B. Grosse,et al.  Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions , 2019, ICLR.

[38]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[39]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[40]  Georgios Piliouras,et al.  Game dynamics as the meaning of a game , 2019, SECO.

[41]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[42]  S. Shankar Sastry,et al.  On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..

[43]  Aravind Rajeswaran,et al.  A Game Theoretic Framework for Model Based Reinforcement Learning , 2020, ICML.

[44]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[45]  P. Lancaster,et al.  The theory of matrices : with applications , 1985 .

[46]  Hassan K. Khalil,et al.  Singular perturbation methods in control : analysis and design , 1986 .

[47]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[48]  S. Sastry,et al.  Jump behavior of circuits and systems , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[49]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[50]  Lillian J. Ratliff,et al.  Convergence Analysis of Gradient-Based Learning in Continuous Games , 2019, UAI.

[51]  Sameer Kamal,et al.  On the Convergence, Lock-In Probability, and Sample Complexity of Stochastic Approximation , 2010, SIAM J. Control. Optim..

[52]  Ioannis Mitliagkas,et al.  Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[53]  John C. Duchi,et al.  Certifying Some Distributional Robustness with Principled Adversarial Training , 2017, ICLR.

[54]  Lillian Ratliff,et al.  Local Nash Equilibria are Isolated, Strict Local Nash Equilibria in ‘Almost All’ Zero-Sum Continuous Games , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[55]  P. Olver Nonlinear Systems , 2013 .

[56]  S. Shankar Sastry,et al.  Genericity and structural stability of non-degenerate differential Nash equilibria , 2014, 2014 American Control Conference.

[57]  R. G. Casten,et al.  Basic Concepts Underlying Singular Perturbation Techniques , 1972 .

[58]  Tanner Fiez,et al.  Implicit Learning Dynamics in Stackelberg Games: Equilibria Characterization, Convergence Analysis, and Empirical Study , 2020, ICML.

[59]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[60]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[61]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[62]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[63]  Shimon Whiteson,et al.  Stable Opponent Shaping in Differentiable Games , 2018, ICLR.

[64]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[65]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[66]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[67]  M. Benaïm A Dynamical System Approach to Stochastic Approximations , 1996 .

[68]  D. Mustafa,et al.  Generalized integral controllability , 1994, Proceedings of 1994 33rd IEEE Conference on Decision and Control.

[69]  Guodong Zhang,et al.  On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach , 2019, ICLR.

[70]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[71]  John M Alongi,et al.  Recurrence and Topology , 2007 .