Convergence of Learning Dynamics in Stackelberg Games

This paper investigates the convergence of learning dynamics in Stackelberg games. In the class of games we consider, there is a hierarchical game being played between a leader and a follower with continuous action spaces. We establish a number of connections between the Nash and Stackelberg equilibrium concepts and characterize conditions under which attracting critical points of simultaneous gradient descent are Stackelberg equilibria in zero-sum games. Moreover, we show that the only stable critical points of the Stackelberg gradient dynamics are Stackelberg equilibria in zero-sum games. Using this insight, we develop a gradient-based update for the leader while the follower employs a best response strategy for which each stable critical point is guaranteed to be a Stackelberg equilibrium in zero-sum games. As a result, the learning rule provably converges to a Stackelberg equilibria given an initialization in the region of attraction of a stable critical point. We then consider a follower employing a gradient-play update rule instead of a best response strategy and propose a two-timescale algorithm with similar asymptotic convergence guarantees. For this algorithm, we also provide finite-time high probability bounds for local convergence to a neighborhood of a stable Stackelberg equilibrium in general-sum games. Finally, we present extensive numerical results that validate our theory, provide insights into the optimization landscape of generative adversarial networks, and demonstrate that the learning dynamics we propose can effectively train generative adversarial networks.

[1]  E. Trélat,et al.  Min-max and min-min stackelberg strategies with closed-loop information structure , 2011 .

[2]  Christos Dimitrakakis,et al.  Multi-View Decision Processes: The Helper-AI Problem , 2017, NIPS.

[3]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[4]  Vivek S. Borkar,et al.  Concentration bounds for two time scale stochastic approximation , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Shalabh Bhatnagar,et al.  Stochastic Recursive Algorithms for Optimization , 2012 .

[6]  M. Hirsch,et al.  Mixed Equilibria and Dynamical Systems Arising from Fictitious Play in Perturbed Games , 1999 .

[7]  T. Başar,et al.  Closed-loop Stackelberg strategies with applications in the optimal control of multilevel systems , 1979 .

[8]  Alexander J. Zaslavski Necessary optimality conditions for bilevel minimization problems , 2012 .

[9]  Anca D. Dragan,et al.  Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration , 2016, AAMAS.

[10]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[11]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[12]  S. Shankar Sastry,et al.  On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..

[13]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[14]  Chuan-Sheng Foo,et al.  Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile , 2018, ICLR.

[15]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[16]  L. A. Prashanth,et al.  Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods , 2012 .

[17]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[18]  J. Danskin The Theory of Max-Min and its Application to Weapons Allocation Problems , 1967 .

[19]  Lillian J. Ratliff,et al.  Adaptive Incentive Design , 2018, IEEE Transactions on Automatic Control.

[20]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[21]  Anca D. Dragan,et al.  Planning for Autonomous Cars that Leverage Effects on Human Actions , 2016, Robotics: Science and Systems.

[22]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[23]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[24]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[25]  Anca D. Dragan,et al.  Hierarchical Game-Theoretic Planning for Autonomous Vehicles , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[26]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[27]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[28]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[29]  Ankit B. Patel,et al.  Towards a Better Understanding and Regularization of GAN Training Dynamics , 2018, UAI.

[30]  Carsten Trunk,et al.  On a class of non-Hermitian matrices with positive definite Schur complements , 2018, Proceedings of the American Mathematical Society.

[31]  J. Goodman Note on Existence and Uniqueness of Equilibrium Points for Concave N-Person Games , 1965 .

[32]  T. Tao,et al.  Honeycombs and sums of Hermitian matrices , 2000, math/0009048.

[33]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[34]  Timothy F. Bresnahan,et al.  Duopoly Models with Consistent Conjectures , 1981 .

[35]  S. Hart,et al.  Simple Adaptive Strategies: From Regret-matching To Uncoupled Dynamics , 2013 .

[36]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[37]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[38]  Victor R. Lesser,et al.  Multi-Agent Learning with Policy Prediction , 2010, AAAI.

[39]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[40]  M. Benaïm Dynamics of stochastic approximation algorithms , 1999 .

[41]  J. Danskin The Theory of Max-Min, with Applications , 1966 .

[42]  S. Shankar Sastry,et al.  On the Characterization of Local Nash Equilibria in Continuous Games , 2014, IEEE Transactions on Automatic Control.

[43]  G. P. Papavassilopoulos,et al.  Sufficient conditions for Stackelberg and Nash strategies with memory , 1980 .

[44]  Shimon Whiteson,et al.  Stable Opponent Shaping in Differentiable Games , 2018, ICLR.

[45]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[46]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[47]  Siddhartha S. Srinivasa,et al.  Game-Theoretic Modeling of Human Adaptation in Human-Robot Collaboration , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[48]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[49]  Lillian J. Ratliff,et al.  A Perspective on Incentive Design: Challenges and Opportunities , 2019, Annu. Rev. Control. Robotics Auton. Syst..

[50]  Simon P. Anderson,et al.  Stackelberg versus Cournot oligopoly equilibrium , 1992 .

[51]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[52]  J. B. Cruz,et al.  Nonclassical control problems and Stackelberg games , 1979 .

[53]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[54]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[55]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[56]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[57]  Phillipp Meister,et al.  Stochastic Recursive Algorithms For Optimization Simultaneous Perturbation Methods , 2016 .

[58]  João Pedro Hespanha,et al.  Linear Systems Theory , 2009 .

[59]  C. Tretter Spectral Theory Of Block Operator Matrices And Applications , 2008 .

[60]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[61]  E. J. Collins,et al.  Convergent multiple-timescales reinforcement learning algorithms in normal form games , 2003 .

[62]  H. J. Sussmann Nonlinear Systems Theory (John L. Casti) , 1988 .

[63]  V. Borkar,et al.  A Concentration Bound for Stochastic Approximation via Alekseev’s Formula , 2015, Stochastic Systems.

[64]  Lars M. Mescheder,et al.  On the convergence properties of GAN training , 2018, ArXiv.

[65]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.