Scalable Distributed Monte-Carlo Tree Search

Monte-Carlo Tree Search (MCTS) is remarkably successful in two-player games, but parallelizing MCTS has been notoriously difficult to scale well, especially in distributed environments. For a distributed parallel search, transposition-table driven scheduling (TDS) is known to be efficient in several domains. We present a massively parallel MCTS algorithm, that applies the TDS parallelism to the Upper Confidence bound Applied to Trees (UCT) algorithm, which is the most representative MCTS algorithm. To drastically decrease communication overhead, we introduce a reformulation of UCT called Depth-First UCT. The parallel performance of the algorithm is evaluated on clusters using up to 1,200 cores in artificial game-trees. We show that this approach scales well, achieving 740-fold speedups in the best case.

[1]  Donald E. Knuth,et al.  The Solution for the Branching Factor of the Alpha-Beta Pruning Algorithm , 1981, ICALP.

[2]  Richard E. Korf,et al.  Depth-First Iterative-Deepening: An Optimal Admissible Tree Search , 1985, Artif. Intell..

[3]  Albert L. Zobrist,et al.  A New Hashing Method with Application for Game Playing , 1990 .

[4]  Dana S. Nau,et al.  An Analysis of Forward Pruning , 1994, AAAI.

[5]  Jonathan Schaeffer,et al.  Distributed game-tree search using transposition table driven work scheduling , 2002, Proceedings International Conference on Parallel Processing.

[6]  Jonathan Schaeffer,et al.  Transposition Table Driven Work Scheduling in Distributed Game-Tree Search , 2002, Canadian Conference on AI.

[7]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[8]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[9]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[10]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[11]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[12]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[13]  T. Cazenave,et al.  On the Parallelization of UCT , 2007 .

[14]  H. Jaap van den Herik,et al.  Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[15]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[16]  Olivier Teytaud,et al.  On the Parallelization of Monte-Carlo planning , 2008, ICINCO 2008.

[17]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[18]  Akihiro Kishimoto,et al.  Scalable, Parallel Best-First Search for Optimal Sequential Planning , 2009, ICAPS.

[19]  Martin Müller,et al.  A Lock-Free Multithreaded Monte-Carlo Tree Search Algorithm , 2009, ACG.

[20]  Hideki Imai,et al.  A study on security evaluation methodology for image-based biometrics authentication systems , 2009, 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems.

[21]  Thomas Hérault,et al.  Scalability and Parallelization of Monte-Carlo Tree Search , 2010, Computers and Games.

[22]  Martin Müller,et al.  Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[23]  Osamu Watanabe,et al.  Evaluating Root Parallelization in Go , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[24]  Richard B. Segal,et al.  On the Scalability of Parallel UCT , 2010, Computers and Games.