Monte-Carlo Tree Search: To MC or to DP?

State-of-the-art Monte-Carlo tree search algorithms can be parametrized with any of the two information updating procedures: MC-backup and DP-backup. The dynamics of these two procedures is very different, and so far, their relative pros and cons have been poorly understood. Formally analyzing the dependency of MC-and DP-backups on various MDP parameters, we reveal numerous important issues that get hidden by the worst-case bounds on the algorithm performance, and reconfirm these findings by a systematic experimental test.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Nathan R. Sturtevant,et al.  An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[4]  Rémi Munos,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[5]  Frédérick Garcia,et al.  On-Line Search for Solving Markov Decision Processes via Heuristic Sampling , 2004, ECAI.

[6]  Carmel Domshlak,et al.  Simple Regret Optimization in Online Planning for Markov Decision Processes , 2012, J. Artif. Intell. Res..

[7]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[8]  Malte Helmert,et al.  High-Quality Policies for the Canadian Traveler's Problem , 2010, SOCS.

[9]  Alan Fern,et al.  Lower Bounding Klondike Solitaire with Monte-Carlo Planning , 2009, ICAPS.

[10]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[11]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[12]  Rémi Munos,et al.  Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[13]  Carmel Domshlak,et al.  On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs , 2014, ICAPS.

[14]  Carmel Domshlak,et al.  Monte-Carlo Planning: Theoretically Fast Convergence Meets Practical Efficiency , 2013, UAI.

[15]  David Tolpin,et al.  MCTS Based on Simple Regret , 2012, AAAI.

[16]  Tristan Cazenave,et al.  Nested Monte-Carlo Search , 2009, IJCAI.

[17]  Rémi Munos,et al.  Open Loop Optimistic Planning , 2010, COLT.

[18]  Christopher D. Rosin,et al.  Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[19]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[20]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22]  David Silver,et al.  Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[23]  Malte Helmert,et al.  Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.