Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games

We show by counterexample that policy-gradient algorithms have no guarantees of even local convergence to Nash equilibria in continuous action and state space multi-agent settings. To do so, we analyze gradient-play in N-player general-sum linear quadratic games, a classic game setting which is recently emerging as a benchmark in the field of multi-agent learning. In such games the state and action spaces are continuous and global Nash equilibria can be found be solving coupled Ricatti equations. Further, gradient-play in LQ games is equivalent to multi agent policy-gradient. We first show that these games are surprisingly not convex games. Despite this, we are still able to show that the only critical points of the gradient dynamics are global Nash equilibria. We then give sufficient conditions under which policy-gradient will avoid the Nash equilibria, and generate a large number of general-sum linear quadratic games that satisfy these conditions. In such games we empirically observe the players converging to limit cycles for which the time average does not coincide with a Nash equilibrium. The existence of such games indicates that one of the most popular approaches to solving reinforcement learning problems in the classic reinforcement learning setting has no local guarantee of convergence in multi-agent settings. Further, the ease with which we can generate these counterexamples suggests that such situations are not mere edge cases and are in fact quite common.

[1]  R. E. Kalman,et al.  Contributions to the Theory of Optimal Control , 1960 .

[2]  D. Lukes,et al.  A GLOBAL THEORY FOR LINEAR QUADRATIC DIFFERENTIAL GAMES. , 1971 .

[3]  T. Başar On the uniqueness of the Nash solution in Linear-Quadratic differential Games , 1976 .

[4]  T. Başar,et al.  Dynamic Noncooperative Game Theory , 1982 .

[5]  R. Abraham,et al.  Manifolds, Tensor Analysis, and Applications , 1983 .

[6]  M. Shub Global Stability of Dynamical Systems , 1986 .

[7]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[8]  T.-Y. Li,et al.  Lyapunov Iterations for Solving Coupled Algebraic Riccati Equations of Nash Differential Games and Algebraic Riccati Equations of Zero-Sum Games , 1995 .

[9]  J. Engwerda On the Scalar Feedback Nash Equilibria in the Infinite Horizon LQ-Game , 1998 .

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  S. Smale Differentiable dynamical systems , 1967 .

[13]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008, Texts and Readings in Mathematics.

[14]  Cars H. Hommes,et al.  Multiple equilibria and limit cycles in evolutionary games with Logit Dynamics , 2012, Games Econ. Behav..

[15]  S. Shankar Sastry,et al.  Characterization and computation of local Nash equilibria in continuous games , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[16]  Corrado Possieri,et al.  An algebraic geometry approach for the computation of all linear feedback Nash equilibria in LQ differential games , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[17]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[18]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[19]  Michael H. Bowling,et al.  Actor-Critic Policy Optimization in Partially Observable Multiagent Environments , 2018, NeurIPS.

[20]  Sham M. Kakade,et al.  Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator , 2018, ICML.

[21]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[22]  Christos H. Papadimitriou,et al.  Cycles in adversarial regularized learning , 2017, SODA.

[23]  Georgios Piliouras,et al.  Game dynamics as the meaning of a game , 2019, SECO.

[24]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[25]  Lillian J. Ratliff,et al.  Convergence Analysis of Gradient-Based Learning in Continuous Games , 2019, UAI.

[26]  Lillian Ratliff,et al.  Local Nash Equilibria are Isolated, Strict Local Nash Equilibria in ‘Almost All’ Zero-Sum Continuous Games , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[27]  Tamer Basar,et al.  Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games , 2019, NeurIPS.

[28]  Martin J. Wainwright,et al.  Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems , 2018, AISTATS.

[29]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[30]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[31]  S. Shankar Sastry,et al.  On Gradient-Based Learning in Continuous Games , 2018, SIAM J. Math. Data Sci..

[32]  Nikolai Matni,et al.  On the Sample Complexity of the Linear Quadratic Regulator , 2017, Foundations of Computational Mathematics.