论文信息 - Computing Equilibria in Multiplayer Stochastic Games of Imperfect Information - 字舞流文

Computing Equilibria in Multiplayer Stochastic Games of Imperfect Information

Computing a Nash equilibrium in multiplayer stochastic games is a notoriously difficult problem. Prior algorithms have been proven to converge in extremely limited settings and have only been tested on small problems. In contrast, we recently presented an algorithm for computing approximate jam/fold equilibrium strategies in a three-player nolimit Texas hold'em tournament--a very large real-world stochastic game of imperfect information [5]. In this paper we show that it is possible for that algorithm to converge to a non-equilibrium strategy profile. However, we develop an ex post procedure that determines exactly how much each player can gain by deviating from his strategy and confirm that the strategies computed in that paper actually do constitute an e-equilibrium for a very small e (0.5% of the tournament entry fee). Next, we develop several new algorithms for computing a Nash equilibrium in multiplayer stochastic games (with perfect or imperfect information) which can provably never converge to a non-equilibrium. Experiments show that one of these algorithms outperforms the original algorithm on the same poker tournament. In short, we present the first algorithms for provably computing an e-equilibrium of a large stochastic game for small e. Finally, we present an efficient algorithm that minimizes external regret in both the perfect and imperfect information cases.

Tuomas Sandholm | Sam Ganzfried | T. Sandholm | Sam Ganzfried

[1] Jonathan Schaeffer,et al. Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[2] D. Fudenberg,et al. The Theory of Learning in Games , 1998 .

[3] Xiaofeng Wang,et al. Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[4] Troels Bjerre Lund,et al. A heads-up no-limit Texas Hold'em poker player: discretized betting models and automatically generated equilibrium-finding programs , 2008, AAMAS.

[5] Ronen I. Brafman,et al. Learning to Coordinate Efficiently: A Model-based Approach , 2003, J. Artif. Intell. Res..

[6] Christophe Jermann. Résolution de contraintes géométriques par rigidifications récursive et propagation d'intervalles , 2002 .

[7] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[8] Tuomas Sandholm,et al. A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.

[9] M HoffmanChristoph,et al. Decomposition Plans for Geometric Constraint Systems, Part I , 2001 .

[10] Eric V. Denardo,et al. Flows in Networks , 2011 .

[11] G. Laman. On graphs and rigidity of plane skeletal structures , 1970 .

[12] Pascal Schreck,et al. Geometric Construction by Assembling Solved Subfigures , 1998, Artif. Intell..

[13] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14] Christoph M. Hoffmann,et al. Finding Solvable Subsets of Constraint Graphs , 1997, CP.

[15] Yishay Mansour,et al. Fast Planning in Stochastic Games , 2000, UAI.

[16] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[17] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[18] Tuomas Sandholm,et al. Computing an approximate jam/fold equilibrium for 3-player no-limit Texas Hold'em tournaments , 2008, AAMAS.

[19] Christoph M. Hoffmann,et al. Geometric constraint solver , 1995, Comput. Aided Des..

[20] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[21] Xiaotie Deng,et al. Settling the Complexity of Two-Player Nash Equilibrium , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[22] Peter Bro Miltersen,et al. A near-optimal strategy for a heads-up no-limit Texas Hold'em poker tournament , 2007, AAMAS '07.

[23] Glenn A. Kramer,et al. Solving Geometric Constraint Systems , 1990, AAAI.

[24] Michael H. Bowling,et al. Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[25] Gilles Trombettoni,et al. A Constraint Programming Approach for Solving Rigid Geometric Systems , 2000, CP.

[26] Dominique Michelucci,et al. Qualitative Study of Geometric Constraints , 1998 .