Nested Monte-Carlo Tree Search for Online Planning in Large MDPs

Monte-Carlo Tree Search (MCTS) is state of the art for online planning in large MDPs. It is a best-first, sample-based search algorithm in which every state in the search tree is evaluated by the average outcome of Monte-Carlo rollouts from that state. These rollouts are typically random or directed by a simple, domain-dependent heuristic. We propose Nested Monte-Carlo Tree Search (NMCTS), in which MCTS itself is recursively used to provide a rollout policy for higher-level searches. In three large-scale MDPs, SameGame, Clickomania and Bubble Breaker, we show that NMCTS is significantly more effective than regular MCTS at equal time controls, both using random and heuristic rollouts at the base level. Experiments also suggest superior performance to Nested Monte-Carlo Search (NMCS) in some domains.

[1]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Maarten P. D. Schadd,et al.  Addressing NP-Complete Puzzles with Monte-Carlo Methods 1 , 2008 .

[4]  Tristan Cazenave,et al.  Nested Monte-Carlo Expression Discovery , 2010, ECAI.

[5]  Tristan Cazenave,et al.  Nested Monte-Carlo Search , 2009, IJCAI.

[6]  Markus Püschel,et al.  Bandit-based optimization on graphs with application to library performance tuning , 2009, ICML '09.

[7]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[8]  Martin Müller,et al.  Monte-Carlo Exploration for Deterministic Planning , 2009, IJCAI.

[9]  Tzung-Pei Hong,et al.  The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments , 2009, IEEE Transactions on Computational Intelligence and AI in Games.

[10]  Jean Méhat,et al.  Combining UCT and Nested Monte Carlo Search for Single-Player General Game Playing , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[11]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[12]  Christopher D. Rosin,et al.  Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[13]  Shimpei Matsumoto,et al.  Evaluation of Simulation Strategy on Single-Player Monte-Carlo Tree Search and its Discussion for a Practical Scheduling Problem , 2010 .

[14]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[15]  Ashish Sabharwal,et al.  Guiding Combinatorial Optimization with UCT , 2012, CPAIOR.

[16]  Erik D. Demaine,et al.  The Complexity of Clickomania , 2001, ArXiv.

[17]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[18]  Olivier Teytaud,et al.  Towards a Solution of 7x7 Go with Meta-MCTS , 2011, ACG.

[19]  Olivier Teytaud,et al.  Grid Coevolution for Adaptive Simulations: Application to the Building of Opening Books in the Game of Go , 2009, EvoWorkshops.

[20]  T. Cazenave Reflexive Monte-Carlo Search , 2007 .

[21]  Jos W. H. M. Uiterwijk,et al.  Single-player Monte-Carlo tree search for SameGame , 2012, Knowl. Based Syst..

[22]  Walter A. Kosters,et al.  Solving SameGame and its Chessboard Variant , 2009 .

[23]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[24]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[25]  Benjamin Van Roy,et al.  Solitaire: Man Versus Machine , 2004, NIPS.

[26]  Flavien Balbo,et al.  Using a monte-carlo approach for bus regulation , 2009, 2009 12th International IEEE Conference on Intelligent Transportation Systems.

[27]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[28]  Richard J. Lorentz Amazons Discover Monte-Carlo , 2008, Computers and Games.

[29]  Alan Fern,et al.  Searching Solitaire in Real Time , 2007, J. Int. Comput. Games Assoc..

[30]  Kanako Komiya,et al.  Nested Monte-Carlo Search with AMAF Heuristic , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[31]  David Silver,et al.  Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008) Achieving Master Level Play in 9 × 9 Computer Go , 2022 .

[32]  Takeshi Ito,et al.  Monte-Carlo tree search in Ms. Pac-Man , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[33]  H. Jaap van den Herik,et al.  Single-Player Monte-Carlo Tree Search , 2008, Computers and Games.

[34]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[35]  Olivier Teytaud,et al.  Meta Monte-Carlo Tree Search for Automatic Opening Book Generation , 2009 .

[36]  Fabien Teytaud,et al.  Optimization of the Nested Monte-Carlo Algorithm on the Traveling Salesman Problem with Time Windows , 2011, EvoApplications.

[37]  H. Jaap van den Herik,et al.  Proceedings of the First International Conference on Computers and Games , 1998 .

[38]  Mark H. M. Winands,et al.  Monte Carlo Tree Search in Lines of Action , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[39]  Michael Wooldridge,et al.  Proceedings of the 21st International Joint Conference on Artificial Intelligence , 2009 .