An Analysis of Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in recent years. Despite the vast amount of research into MCTS, the effect of modifications on the algorithm, as well as the manner in which it performs in various domains, is still not yet fully known. In particular, the effect of using knowledge-heavy rollouts in MCTS still remains poorly understood, with surprising results demonstrating that better-informed rollouts often result in worse-performing agents. We present experimental evidence suggesting that, under certain smoothness conditions, uniformly random simulation policies preserve the ordering over action preferences. This explains the success of MCTS despite its common use of these rollouts to evaluate states. We further analyse non-uniformly random rollout policies and describe conditions under which they offer improved performance.

[1]  Dana S. Nau,et al.  Pathology on Game Trees Revisited, and an Alternative to Minimaxing , 1983, Artif. Intell..

[2]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[3]  Yngvi Björnsson,et al.  Simulation-Based Approach to General Game Playing , 2008, AAAI.

[4]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5]  Jason Pazis,et al.  Non-Parametric Approximate Linear Programming for MDPs , 2011, AAAI.

[6]  Peter Dayan,et al.  Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search , 2013, J. Artif. Intell. Res..

[7]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[8]  Steven James,et al.  An investigation into the effectiveness of heavy rollouts in UCT , 2016 .

[9]  Carmel Domshlak,et al.  To UCT, or not to UCT? (Position Paper) , 2013, SOCS.

[10]  R. Ramanujan,et al.  On the Behavior of UCT in Synthetic Search Spaces , 2011 .

[11]  Dana S. Nau,et al.  An Investigation of the Causes of Pathology in Games , 1982, Artif. Intell..

[12]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[13]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[14]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[15]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[16]  Joel Veness,et al.  Variance Reduction in Monte-Carlo Tree Search , 2011, NIPS.

[17]  Gerald Tesauro,et al.  Monte-Carlo simulation balancing , 2009, ICML '09.

[18]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[19]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[20]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.