Automated Action Abstraction of Imperfect Information Extensive-Form Games

Multi-agent decision problems can often be formulated as extensive-form games. We focus on imperfect information extensive-form games in which one or more actions at many decision points have an associated continuous or many-valued parameter. A stock trading agent, in addition to deciding whether to buy or not, must decide how much to buy. In no-limit poker, in addition to selecting a probability for each action, the agent must decide how much to bet for each betting action. Selecting values for these parameters makes these games extremely large. Two-player no-limit Texas Hold'em poker with stacks of 500 big blinds has approximately 1071 states, which is more than 1050 times more states than two-player limit Texas Hold'em. The main contribution of this paper is a technique that abstracts a game's action space by selecting one, or a small number, of the many values for each parameter. We show that strategies computed using this new algorithm for no-limit Leduc poker exhibit significant utility gains over e-Nash equilibrium strategies computed with standard, hand-crafted parameter value abstractions.

[1]  Kevin Waugh,et al.  Abstraction pathologies in extensive games , 2009, AAMAS.

[2]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[3]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[4]  Andrea Bonarini,et al.  Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods , 2007, NIPS.

[5]  David Schnizlein,et al.  State translation in no-limit poker , 2009 .

[6]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[7]  Tuomas Sandholm,et al.  Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker , 2007, AAMAS '07.

[8]  Csaba Szepesvári,et al.  Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.

[9]  Kee-Eung Kim,et al.  Solving Stochastic Planning Problems with Large State and Action Spaces , 1998, AIPS.

[10]  S. H. Tijs Stochastic games with one big action space in each state , 1980 .

[11]  Troels Bjerre Lund,et al.  Potential-Aware Automated Abstraction of Sequential Games, and Holistic Equilibrium Analysis of Texas Hold'em Poker , 2007, AAAI.

[12]  Javier Peña,et al.  Gradient-Based Algorithms for Finding Nash Equilibria in Extensive Form Games , 2007, WINE.

[13]  Kevin Waugh,et al.  A Practical Use of Imperfect Recall , 2009, SARA.

[14]  Bill Chen,et al.  The Mathematics of Poker , 2006 .

[15]  Vincent Corruble,et al.  Designing a Reinforcement Learning-based Adaptive AI for Large-Scale Strategy Games , 2006, AIIDE.

[16]  Kevin Waugh,et al.  Strategy Grafting in Extensive Games , 2009, NIPS.

[17]  Troels Bjerre Lund,et al.  A heads-up no-limit Texas Hold'em poker player: discretized betting models and automatically generated equilibrium-finding programs , 2008, AAMAS.