论文信息 - Monte-Carlo Tree Search: To MC or to DP?

Monte-Carlo Tree Search: To MC or to DP?

State-of-the-art Monte-Carlo tree search algorithms can be parametrized with any of the two information updating procedures: MC-backup and DP-backup. The dynamics of these two procedures is very different, and so far, their relative pros and cons have been poorly understood. Formally analyzing the dependency of MC-and DP-backups on various MDP parameters, we reveal numerous important issues that get hidden by the worst-case bounds on the algorithm performance, and reconfirm these findings by a systematic experimental test.

Carmel Domshlak | Zohar Feldman | C. Domshlak | Z. Feldman | Zohar Feldman

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3] Nathan R. Sturtevant,et al. An Analysis of UCT in Multi-Player Games , 2008, J. Int. Comput. Games Assoc..

[4] Rémi Munos,et al. Bandit Algorithms for Tree Search , 2007, UAI.

[5] Frédérick Garcia,et al. On-Line Search for Solving Markov Decision Processes via Heuristic Sampling , 2004, ECAI.

[6] Carmel Domshlak,et al. Simple Regret Optimization in Online Planning for Markov Decision Processes , 2012, J. Artif. Intell. Res..

[7] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[8] Malte Helmert,et al. High-Quality Policies for the Canadian Traveler's Problem , 2010, SOCS.

[9] Alan Fern,et al. Lower Bounding Klondike Solitaire with Monte-Carlo Planning , 2009, ICAPS.

[10] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[11] U. Rieder,et al. Markov Decision Processes , 2010 .

[12] Rémi Munos,et al. Pure exploration in finitely-armed and continuous-armed bandits , 2011, Theor. Comput. Sci..

[13] Carmel Domshlak,et al. On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs , 2014, ICAPS.

[14] Carmel Domshlak,et al. Monte-Carlo Planning: Theoretically Fast Convergence Meets Practical Efficiency , 2013, UAI.

[15] David Tolpin,et al. MCTS Based on Simple Regret , 2012, AAAI.

[16] Tristan Cazenave,et al. Nested Monte-Carlo Search , 2009, IJCAI.

[17] Rémi Munos,et al. Open Loop Optimistic Planning , 2010, COLT.

[18] Christopher D. Rosin,et al. Nested Rollout Policy Adaptation for Monte Carlo Tree Search , 2011, IJCAI.

[19] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[20] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[22] David Silver,et al. Monte-Carlo tree search and rapid action value estimation in computer Go , 2011, Artif. Intell..

[23] Malte Helmert,et al. Trial-Based Heuristic Tree Search for Finite Horizon MDPs , 2013, ICAPS.