论文信息 - Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent

Nash equilibrium is a central concept in game theory. Several Nash solvers exist, yet none scale to normal-form games with many actions and many players, especially those with payoff tensors too big to be stored in memory. In this work, we propose an approach that iteratively improves an approximation to a Nash equilibrium through joint play. It accomplishes this by tracing a previously established homotopy that defines a continuum of equilibria for the game regularized with decaying levels of entropy. This continuum asymptotically approaches the limiting logit equilibrium, proven by McKelvey and Palfrey (1995) to be unique in almost all games, thereby partially circumventing the well-known equilibrium selection problem of many-player games. To encourage iterates to remain near this path, we efficiently minimize average deviation incentive via stochastic gradient descent, intelligently sampling entries in the payoff tensor as needed. Monte Carlo estimates of the stochastic gradient from joint play are biased due to the appearance of a nonlinear max operator in the objective, so we introduce additional innovations to the algorithm to alleviate gradient bias. The descent process can also be viewed as repeatedly constructing and reacting to a polymatrix approximation to the game. In these ways, our proposed approach, average deviation incentive descent with adaptive sampling (ADIDAS), is most similar to three classical approaches, namely homotopy-type, Lyapunov, and iterative polymatrix solvers. The lack of local convergence guarantees for biased gradient descent prevents guaranteed convergence to Nash, however, we demonstrate through extensive experiments the ability of this approach to approximate a unique Nash equilibrium in normal-form games with as many as seven players and twenty one actions (several billion outcomes) that are orders of magnitude larger than those possible with prior algorithms. Proc. of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022), P. Faliszewski, V. Mascardi, C. Pelachaud, M.E. Taylor (eds.), May 9–13, 2022, Online. © 2022 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

[1] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2] Sriram Srinivasan,et al. OpenSpiel: A Framework for Reinforcement Learning in Games , 2019, ArXiv.

[3] Xiaotie Deng,et al. Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[4] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[5] John Fearnley,et al. An Empirical Study on Computing Equilibria in Polymatrix Games , 2016, AAMAS.

[6] H. Kuk. On equilibrium points in bimatrix games , 1996 .

[7] Sebastian Nowozin,et al. The Numerics of GANs , 2017, NIPS.

[8] Adam Lerer,et al. Combining Deep Reinforcement Learning and Search for Imperfect-Information Games , 2020, NeurIPS.

[9] Kousha Etessami,et al. On the Complexity of Nash Equilibria and Other Fixed Points , 2010, SIAM J. Comput..

[10] Xiaotie Deng,et al. Settling the complexity of computing two-player Nash equilibria , 2007, JACM.

[11] Paul G. Spirakis,et al. Computing Approximate Nash Equilibria in Polymatrix Games , 2015, Algorithmica.

[12] R. McKelvey,et al. Computation of equilibria in finite games , 1996 .

[13] Paul W. Goldberg,et al. The complexity of computing a Nash equilibrium , 2006, STOC '06.

[14] Philip B. Stark,et al. From Battlefields to Elections: Winning Strategies of Blotto and Auditing Games , 2018, SODA.

[15] Karl Tuyls,et al. Computing Approximate Equilibria in Sequential Adversarial Games by Exploitability Descent , 2019, IJCAI.

[16] Mohammad Taghi Hajiaghayi,et al. Faster and Simpler Algorithm for Optimal Strategies of Blotto Game , 2016, AAAI.

[17] Daphne Koller,et al. A Continuation Method for Nash Equilibria in Structured Games , 2003, IJCAI.

[18] Michael P. Wellman,et al. Empirical game-theoretic analysis of the TAC Supply Chain game , 2007, AAMAS '07.

[19] D. Whitehead,et al. The El Farol Bar Problem Revisited: Reinforcement Learning in a Potential Game , 2008 .

[20] A. Juditsky,et al. Solving variational inequalities with Stochastic Mirror-Prox algorithm , 2008, 0809.0815.

[21] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[22] Robert Wilson,et al. A global Newton method to compute Nash equilibria , 2003, J. Econ. Theory.

[23] Mohammad Taghi Hajiaghayi,et al. From Duels to Battlefields: Computing Equilibria of Blotto and Other Games , 2016, AAAI.

[24] F. Facchinei,et al. Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[25] Alain Durmus,et al. Convergence Analysis of Riemannian Stochastic Approximation Schemes , 2020, ArXiv.

[26] Yoav Shoham,et al. Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[27] Jie Chen,et al. Stochastic Gradient Descent with Biased but Consistent Gradient Estimators , 2018, ArXiv.

[28] Georg Ostrovski,et al. Payoff Performance of Fictitious Play , 2013, ArXiv.

[29] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[30] Robert Wilson,et al. Computing Nash equilibria by iterated polymatrix approximation , 2004 .

[31] Jonathan Gray,et al. Human-Level Performance in No-Press Diplomacy via Equilibrium Search , 2020, ICLR.

[32] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[33] David M. Pennock,et al. An Empirical Game-Theoretic Analysis of Price Discovery in Prediction Markets , 2016, IJCAI.

[34] Ayala Arad,et al. Multi-dimensional iterative reasoning in action: The case of the Colonel Blotto game , 2012 .

[35] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[36] Avrim Blum,et al. Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[37] Cao Xiao,et al. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[38] Troels Bjerre Lund,et al. On the approximation performance of fictitious play in finite games , 2011, Int. J. Game Theory.

[39] W. Arthur. Complexity in economic theory: inductive reasoning and bounded rationality , 1994 .

[40] Georgios Piliouras,et al. No-regret learning and mixed Nash equilibria: They do not mix , 2020, NeurIPS.

[41] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[42] Michael P. Wellman. Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[43] G. M. Korpelevich. The extragradient method for finding saddle points and other problems , 1976 .

[44] Christos H. Papadimitriou,et al. Cycles in adversarial regularized learning , 2017, SODA.

[45] Yoav Shoham,et al. Simple search methods for finding a Nash equilibrium , 2004, Games Econ. Behav..

[46] Mohammad Taghi Hajiaghayi,et al. Optimal Strategies of Blotto Games: Beyond Convexity , 2019, EC.

[47] Theodore L. Turocy. A dynamic homotopy interpretation of the logistic quantal response equilibrium correspondence , 2005, Games Econ. Behav..

[48] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[49] Enric Boix-Adsera,et al. The Multiplayer Colonel Blotto Game , 2020, EC.

[50] Yoav Shoham,et al. Run the GAMUT: a comprehensive approach to evaluating game-theoretic algorithms , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[51] John Fearnley,et al. Finding Approximate Nash Equilibria of Bimatrix Games via Payoff Queries , 2013, ACM Trans. Economics and Comput..

[52] Tom Eccles,et al. Learning to Play No-Press Diplomacy with Best Response Policy Iteration , 2020, NeurIPS.

[53] Yakov Babichenko,et al. Query complexity of approximate nash equilibria , 2013, STOC.

[54] A. Talman,et al. Simplicial variable dimension algorithms for solving the nonlinear complementarity problem on a product of unit simplices using a general labelling , 1987 .

[55] Mark Fey,et al. Symmetric games with only asymmetric equilibria , 2012, Games Econ. Behav..

[56] Paul W. Goldberg,et al. Learning equilibria of games via payoff queries , 2013, EC '13.

[57] Andrew McLennan,et al. Gambit: Software Tools for Game Theory , 2006 .

[58] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.

[59] Vincent Conitzer,et al. Mixed-Integer Programming Methods for Finding Nash Equilibria , 2005, AAAI.

[60] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[61] Georgios Piliouras,et al. From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization , 2020, ICML.