Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent

The leading approach for solving large imperfect-information games is automated abstraction followed by running an equilibrium-finding algorithm. We introduce a distributed version of the most commonly used equilibrium-finding algorithm, counterfactual regret minimization (CFR), which enables CFR to scale to dramatically larger abstractions and numbers of cores. The new algorithm begets constraints on the abstraction so as to make the pieces running on different computers disjoint. We introduce an algorithm for generating such abstractions while capitalizing on state-of-the-art abstraction ideas such as imperfect recall and earth-mover's distance. Our techniques enabled an equilibrium computation of unprecedented size on a supercomputer with a high inter-blade memory latency. Prior approaches run slowly on this architecture. Our approach also leads to a significant improvement over using the prior best approach on a large shared-memory server with low memory latency. Finally, we introduce a family of post-processing techniques that outperform prior ones. We applied these techniques to generate an agent for two-player no-limit Texas Hold'em, called Tartanian7, that won the 2014 Annual Computer Poker Competition, beating each opponent with statistical significance.

[1]  Bernhard von Stengel,et al.  Fast algorithms for finding randomized strategies in game trees , 1994, STOC '94.

[2]  Michael H. Bowling,et al.  Evaluating state-space abstractions in extensive-form games , 2013, AAMAS.

[3]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[4]  Jonathan Schaeffer,et al.  Approximating Game-Theoretic Optimal Strategies for Full-scale Poker , 2003, IJCAI.

[5]  Kevin Waugh,et al.  Strategy purification and thresholding: effective non-equilibrium approaches for playing large games , 2012, AAMAS.

[6]  Kevin Waugh A Fast and Optimal Hand Isomorphism Algorithm , 2013, AAAI 2013.

[7]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[8]  Javier Peña,et al.  Smoothing Techniques for Computing Nash Equilibria of Sequential Games , 2010, Math. Oper. Res..

[9]  Tuomas Sandholm,et al.  Action Translation in Extensive-Form Games with Large Action Spaces: Axioms, Paradoxes, and the Pseudo-Harmonic Mapping , 2013, IJCAI.

[10]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[11]  Richard G. Gibson Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents , 2014 .

[12]  Michael Johanson,et al.  Measuring the Size of Large No-Limit Poker Games , 2013, ArXiv.

[13]  Geoffrey J. Gordon No-regret Algorithms for Online Convex Programs , 2006, NIPS.

[14]  Tuomas Sandholm,et al.  A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.

[15]  Troels Bjerre Lund,et al.  Potential-Aware Automated Abstraction of Sequential Games, and Holistic Equilibrium Analysis of Texas Hold'em Poker , 2007, AAAI.

[16]  Michael L. Littman,et al.  Abstraction Methods for Game Theoretic Poker , 2000, Computers and Games.

[17]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[18]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[19]  Kevin Waugh,et al.  A Practical Use of Imperfect Recall , 2009, SARA.

[20]  Tuomas Sandholm,et al.  Potential-Aware Imperfect-Recall Abstraction with Earth Mover's Distance in Imperfect-Information Games , 2014, AAAI.

[21]  Zheng Li,et al.  Bounds for Regret-Matching Algorithms , 2006, AI&M.

[22]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[23]  Eric Griffin Jackson,et al.  Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing , 2013, AAAI 2013.

[24]  Tuomas Sandholm,et al.  Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker , 2007, AAMAS '07.

[25]  Troels Bjerre Lund,et al.  A heads-up no-limit Texas Hold'em poker player: discretized betting models and automatically generated equilibrium-finding programs , 2008, AAMAS.