Search in Imperfect Information Games Using Online Monte Carlo Counterfactual Regret Minimization

Online search in games has always been a core interest of artificial intelligence. Advances made in search for perfect information games (such as Chess, Checkers, Go, and Backgammon) have led to AI capable of defeating the world's top human experts. Search in imperfect information games (such as Poker, Bridge, and Skat) is significantly more challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling (OOS), the first imperfect information search algorithm that is guaranteed to converge to an equilibrium strategy in two-player zero-sum games. We show that OOS avoids common problems encountered by existing search algorithms and we experimentally evaluate its convergence rate and practical performance against benchmark strategies in Liar's Dice and a variant of Goofspiel. We show that unlike with Information Set Monte Carlo Tree Search (ISMCTS) the exploitability of the strategies produced by OOS decreases as the amount of search time increases. In practice, OOS performs as well as ISMCTS in head-to-head play while producing strategies with lower exploitability given the same search time.

[1]  Tuomas Sandholm,et al.  A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.

[2]  Ian D. Watson,et al.  Computer poker: A review , 2011, Artif. Intell..

[3]  Sam Ganzfried,et al.  Improving Performance in Imperfect-Information Games with Large State and Action Spaces by Solving Endgames , 2013, AAAI 2013.

[4]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[5]  Nathan R. Sturtevant,et al.  Understanding the Success of Perfect Information Monte Carlo Sampling in Game Tree Search , 2010, AAAI.

[6]  Michael H. Bowling,et al.  Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization , 2012, AAMAS.

[7]  Ian Frank,et al.  Finding Optimal Strategies for Imperfect Information Games , 1998, AAAI/IAAI.

[8]  Matthew L. Ginsberg,et al.  GIB: Imperfect Information in a Computationally Challenging Game , 2011, J. Artif. Intell. Res..

[9]  David Auger,et al.  Multiple Tree for Partially Observable Monte-Carlo Tree Search , 2011, EvoApplications.

[10]  J. Schaeffer,et al.  Comparing UCT versus CFR in Simultaneous Games , 2009 .

[11]  Michael Buro,et al.  Recursive Monte Carlo search for imperfect information games , 2013, 2013 IEEE Conference on Computational Inteligence in Games (CIG).

[12]  Paolo Ciancarini,et al.  Monte Carlo tree search in Kriegspiel , 2010, Artif. Intell..

[13]  Tuomas Sandholm,et al.  A Texas Hold'em poker player based on automated abstraction and real-time equilibrium computation , 2006, AAMAS '06.

[14]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[15]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[16]  Michael H. Bowling,et al.  Solving Imperfect Information Games Using Decomposition , 2013, AAAI.

[17]  Jonathan Schaeffer,et al.  Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games , 2004, Computers and Games.

[18]  Matthew L. Ginsberg,et al.  Partition Search , 1996, AAAI/IAAI, Vol. 1.

[19]  Branislav Bosanský,et al.  Anytime algorithms for multi-agent visibility-based pursuit-evasion games , 2012, AAMAS.

[20]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[21]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[22]  Tuomas Sandholm,et al.  Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker , 2007, AAMAS '07.

[23]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[24]  Michael H. Bowling,et al.  Finding Optimal Abstract Strategies in Extensive-Form Games , 2012, AAAI.

[25]  Tuomas Sandholm,et al.  Algorithms for abstracting and solving imperfect information games , 2009 .

[26]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[27]  Peter I. Cowling,et al.  Information Set Monte Carlo Tree Search , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[28]  Peter I. Cowling,et al.  Integrating Monte Carlo Tree Search with Knowledge-Based Methods to Create Engaging Play in a Commercial Mobile Game , 2013, AIIDE.

[29]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[30]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[31]  Michael H. Bowling,et al.  Evaluating state-space abstractions in extensive-form games , 2013, AAMAS.

[32]  Todd W. Neller,et al.  Approximating Optimal Dudo Play with Fixed-Strategy Iteration Counterfactual Regret Minimization , 2011, ACG.

[33]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[34]  Kevin Waugh,et al.  Monte Carlo Sampling for Regret Minimization in Extensive Games , 2009, NIPS.

[35]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[36]  David N. L. Levy,et al.  Heuristic Programming in Artificial Intelligence , 1991, J. Int. Comput. Games Assoc..