Convergence of Monte Carlo Tree Search in Simultaneous Move Games

We study Monte Carlo tree search (MCTS) in zero-sum extensive-form games with perfect information and simultaneous moves. We present a general template of MCTS algorithms for these games, which can be instantiated by various selection methods. We formally prove that if a selection method is e-Hannan consistent in a matrix game and satisfies additional requirements on exploration, then the MCTS algorithm eventually converges to an approximate Nash equilibrium (NE) of the extensive-form game. We empirically evaluate this claim using regret matching and Exp3 as the selection methods on randomly generated games and empirically selected worst case games. We confirm the formal result and show that additional MCTS variants also converge to approximate NE on the evaluated games.

[1]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[2]  Olivier Teytaud,et al.  Upper Confidence Trees with Short Term Partial Information , 2011, EvoApplications.

[3]  Vincent Conitzer,et al.  A double oracle algorithm for zero-sum security games on graphs , 2011, AAMAS.

[4]  Michael H. Bowling,et al.  Finding Optimal Abstract Strategies in Extensive-Form Games , 2012, AAAI.

[5]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[6]  Y. Mansour,et al.  4 Learning , Regret minimization , and Equilibria , 2006 .

[7]  Michael Buro,et al.  Alpha-Beta Pruning for Games with Simultaneous Moves , 2012, AAAI.

[8]  Mark H. M. Winands,et al.  Monte Carlo Tree Search in Simultaneous Move Games with Applications to Goofspiel , 2013, CGW@IJCAI.

[9]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[10]  Kevin Waugh,et al.  Abstraction in Large Extensive Games , 2009 .

[11]  Michael Buro,et al.  Solving the Oshi-Zumo Game , 2003, ACG.

[12]  Hilmar Finnsson,et al.  Simulation-Based General Game Playing , 2012 .

[13]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[14]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[15]  Damien Ernst,et al.  Comparison of different selection strategies in Monte-Carlo Tree Search for the game of Tron , 2012, 2012 IEEE Conference on Computational Intelligence and Games (CIG).

[16]  J. Schaeffer,et al.  Comparing UCT versus CFR in Simultaneous Games , 2009 .

[17]  Laurent Bartholdi,et al.  Computer Solution to the Game of Pure Strategy , 2012, Games.

[18]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[19]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[20]  Branislav Bosanský,et al.  Using Double-Oracle Method and Serialized Alpha-Beta Search for Pruning in Simultaneous Move Games , 2013, IJCAI.

[21]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.