论文信息 - Monte-Carlo Tree Search by Best Arm Identification

Monte-Carlo Tree Search by Best Arm Identification

Recent advances in bandit tools and techniques for sequential learning are steadily enabling new applications and are promising the resolution of a range of challenging related problems. We study the game tree search problem, where the goal is to quickly identify the optimal move in a given game tree by sequentially sampling its stochastic payoffs. We develop new algorithms for trees of arbitrary depth, that operate by summarizing all deeper levels of the tree into confidence intervals at depth one, and applying a best arm identification procedure at the root. We prove new sample complexity guarantees with a refined dependence on the problem instance. We show experimentally that our algorithms outperform existing elimination-based algorithms and match previous special-purpose methods for depth-two trees.

Wouter M. Koolen | Emilie Kaufmann | E. Kaufmann

[1] Tristan Cazenave,et al. Ieee Transactions on Computational Intelligence and Ai in Games 1 Sequential Halving Applied to Trees , 2022 .

[2] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[3] Matthew Malloy,et al. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits , 2013, COLT.

[4] R. Munos,et al. Best Arm Identification in Multi-Armed Bandits , 2010, COLT.

[5] Jonathan Schaeffer,et al. Best-First Fixed-Depth Minimax Algorithms , 1996, J. Int. Comput. Games Assoc..

[6] Aurélien Garivier,et al. Optimal Best Arm Identification with Fixed Confidence , 2016, COLT.

[7] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.

[8] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[9] Mark H. M. Winands,et al. Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search , 2014, CGW@ECAI.

[10] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..

[11] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.