论文信息 - Fast Payoff Matrix Sparsification Techniques for Structured Extensive-Form Games - 字舞流文

Fast Payoff Matrix Sparsification Techniques for Structured Extensive-Form Games

The practical scalability of many optimization algorithms for large extensive-form games is often limited by the games’ huge payoff matrices. To ameliorate the issue, Zhang and Sandholm (2020) recently proposed a sparsification technique that factorizes the payoff matrix A into a sparser object A = Â + UV >, where the total combined number of nonzeros of Â, U , and V is significantly smaller. Such a factorization can be used in place of the original payoff matrix in many optimization algorithm, such as interior-point and second-order methods, thus increasing the size of games that can be handled. Their technique significantly sparsifies poker (end)games, standard benchmarks used in computational game theory, AI, and more broadly. We show that the existence of extremely sparse factorizations in poker games can be tied to their particular Kronecker-product structure. We clarify how such structure arises and introduce the connection between that structure and sparsification. By leveraging such structure, we give two ways of computing strong sparsifications of poker games (as well as any other game with a similar structure) that are i) orders of magnitude faster to compute, ii) more numerically stable, and iii) produce a dramatically smaller number of nonzeros than the prior technique. Our techniques enable—for the first time—effective computation of high-precision Nash equilibria and strategies subject to constraints on the amount of allowed randomization. Furthermore, they significantly speed up parallel first-order game-solving algorithms; we show state-of-the-art speed on a GPU.

Tuomas Sandholm | Gabriele Farina | T. Sandholm | Gabriele Farina

[1] B. Stengel,et al. Efficient Computation of Behavior Strategies , 1996 .

[2] Tuomas Sandholm,et al. Lossless abstraction of imperfect information games , 2007, JACM.

[3] Michael Bowling,et al. Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games , 2021, ICML.

[4] Tuomas Sandholm,et al. Practical exact algorithm for trembling-hand equilibrium refinements in games , 2018, NeurIPS.

[5] Noam Brown,et al. Superhuman AI for multiplayer poker , 2019, Science.

[6] Kevin Waugh,et al. Accelerating Best Response Calculation in Large Extensive Games , 2011, IJCAI.

[7] Yurii Nesterov,et al. Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[8] Tuomas Sandholm,et al. Solving Imperfect-Information Games via Discounted Regret Minimization , 2018, AAAI.

[9] Neil Burch,et al. Heads-up limit hold’em poker is solved , 2015, Science.

[10] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[11] Kevin Waugh. A Fast and Optimal Hand Isomorphism Algorithm , 2013, AAAI 2013.

[12] Eric van Damme,et al. Non-Cooperative Games , 2000 .

[13] Tuomas Sandholm,et al. Reduced Space and Faster Convergence in Imperfect-Information Games via Pruning , 2017, ICML.

[14] Jonathan Schaeffer,et al. The challenge of poker , 2002, Artif. Intell..

[15] Tuomas Sandholm,et al. Solving Large Sequential Games with the Excessive Gap Technique , 2018, NeurIPS.

[16] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[17] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[18] J. Nash. Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[19] Norman Zadeh,et al. Computation of Optimal Poker Strategies , 1977, Oper. Res..

[20] Tuomas Sandholm,et al. Sparsified Linear Programming for Zero-Sum Equilibrium Finding , 2020, ICML.

[21] Bill Chen,et al. The Mathematics of Poker , 2006 .

[22] Christian Kroer,et al. First-Order Methods with Increasing Iterate Averaging for Solving Saddle-Point Problems , 2019, ArXiv.

[23] Tuomas Sandholm,et al. Faster Game Solving via Predictive Blackwell Approachability: Connecting Regret Matching and Mirror Descent , 2020, AAAI.

[24] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[25] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[26] Javier Peña,et al. Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[27] Donald A. Waterman,et al. Generalization Learning Techniques for Automating the Learning of Heuristics , 1970, Artif. Intell..

[28] Tuomas Sandholm,et al. Better Regularization for Sequential Decision Spaces: Fast Convergence Rates for Nash, Correlated, and Team Equilibria , 2021, EC.

[29] Tuomas Sandholm,et al. Computing equilibria by incorporating qualitative models? , 2010, AAMAS.

[30] Kevin Waugh,et al. Solving Large Extensive-Form Games with Strategy Constraints , 2018, AAAI.

[31] Oskari Tammelin,et al. Solving Large Imperfect Information Games Using CFR+ , 2014, ArXiv.