Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm.

[1]  J. Lavaei,et al.  A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization , 2021, AISTATS.

[2]  Thinh T. Doan,et al.  A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning , 2021, ArXiv.

[3]  Nicolas Le Roux,et al.  On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging , 2021, AISTATS.

[4]  Yuejie Chi,et al.  Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization , 2021, NeurIPS.

[5]  Haipeng Luo,et al.  Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games , 2021, COLT.

[6]  Guanghui Lan Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes , 2021, Mathematical Programming.

[7]  Noah Golowich,et al.  Independent Policy Gradient Methods for Competitive Reinforcement Learning , 2021, NeurIPS.

[8]  A. Ozdaglar,et al.  Fictitious play in zero-sum stochastic games , 2020, SIAM J. Control. Optim..

[9]  Yuanhao Wang,et al.  Improved Algorithms for Convex-Concave Minimax Optimization , 2020, NeurIPS.

[10]  Thinh T. Doan,et al.  A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning , 2020, UAI.

[11]  Csaba Szepesvari,et al.  On the Global Convergence Rates of Softmax Policy Gradient Methods , 2020, ICML.

[12]  Niao He,et al.  Global Convergence and Variance-Reduced Optimization for a Class of Nonconvex-Nonconcave Minimax Problems , 2020, ArXiv.

[13]  Meisam Razaviyayn,et al.  Efficient Search of First-Order Nash Equilibria in Nonconvex-Concave Smooth Min-Max Problems , 2020, SIAM J. Optim..

[14]  Zhuoran Yang,et al.  Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium , 2020, COLT.

[15]  Chi Jin,et al.  Provable Self-Play Algorithms for Competitive Reinforcement Learning , 2020, ICML.

[16]  Peter Richt'arik,et al.  Better Theory for SGD in the Nonconvex World , 2020, Trans. Mach. Learn. Res..

[17]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[18]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[19]  Sriram Srinivasan,et al.  OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[20]  S. Kakade,et al.  Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes , 2019, COLT.

[21]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[22]  Rong Jin,et al.  On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization , 2019, ICML.

[23]  Tatjana Chavdarova,et al.  Reducing Noise in GAN Training with Variance Reduced Extragradient , 2019, NeurIPS.

[24]  Jason D. Lee,et al.  Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods , 2019, NeurIPS.

[25]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[26]  Aryan Mokhtari,et al.  A Unified Analysis of Extra-gradient and Optimistic Gradient Methods for Saddle Point Problems: Proximal Point Approach , 2019, AISTATS.

[27]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[28]  Vicenç Gómez,et al.  A unified view of entropy-regularized Markov decision processes , 2017, ArXiv.

[29]  T. Parthasarathy,et al.  On Completely Mixed Stochastic Games , 2017, Operations Research Forum.

[30]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[31]  Dale Schuurmans,et al.  Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[32]  Amnon Shashua,et al.  Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[33]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[34]  Bruno Scherrer,et al.  Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[35]  Martin A. Riedmiller,et al.  On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[36]  I. Kaplansky A contribution to von Neumann's theory of games. II , 1995 .

[37]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[38]  P. Bernhard,et al.  On a theorem of Danskin with an application to a theorem of Von Neumann-Sion , 1995 .

[39]  T. Raghavan Completely mixed games and M-matrices , 1978 .

[40]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[41]  I. Kaplansky A Contribution to Von Neumann's Theory of Games , 1945 .

[42]  Shaocong Ma,et al.  Sample Efficient Stochastic Policy Extragradient Algorithm for Zero-Sum Markov Game , 2022, ICLR.

[43]  Yuandong Tian,et al.  Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games , 2021, ArXiv.