Improved Monte-Carlo Search

Monte-Carlo search has been successful in many non-deterministic games, and recently in deterministic games with high branching factor. One of the drawbacks of the current approaches is that even if the iterative process would last for a very long time, the selected move does not necessarily converge to a game-theoretic optimal one. In this paper we introduce a new algorithm, UCT, which extends a bandit algorithm for Monte-Carlo search. It is proven that the probability that the algorithm selects the correct move converges to 1. Moreover it is shown empirically that the algorithm converges rather fast even in comparison with alpha-beta search. Experiments in Amazons and Clobber indicate that the UCT algorithm outperforms considerably a plain Monte-Carlo version, and it is competitive against alpha-beta based game programs.

[1]  Marc Snir,et al.  Lower Bounds on Probabilistic Linear Decision Trees , 1985, Theor. Comput. Sci..

[2]  Joel H. Spencer,et al.  Sharp concentration of the chromatic number on random graphsGn, p , 1987, Comb..

[3]  Wansoo T. Rhee,et al.  Martingale Inequalities and NP-Complete Problems , 1987, Math. Oper. Res..

[4]  H. Jaap van den Herik,et al.  Replacement Schemes for Transposition Tables , 1994, J. Int. Comput. Games Assoc..

[5]  Dana S. Nau,et al.  An Analysis of Forward Pruning , 1994, AAAI.

[6]  Gerald Tesauro,et al.  On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[7]  Svante Janson,et al.  Random graphs , 2000, ZOR Methods Model. Oper. Res..

[8]  M. Winands Informed Search in Complex Games , 2000 .

[9]  Theodore Tegos,et al.  Experiments in Computer Amazons , 2002 .

[10]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[11]  Jonathan Schaeffer,et al.  The challenge of poker , 2002, Artif. Intell..

[12]  Svante Janson,et al.  The infamous upper tail , 2002, Random Struct. Algorithms.

[13]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[14]  Bruno Bouzy,et al.  Monte-Carlo Go Developments , 2003, ACG.

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Frédérick Garcia,et al.  On-Line Search for Solving Markov Decision Processes via Heuristic Sampling , 2004, ECAI.

[17]  Yishay Mansour,et al.  A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.

[18]  Tämur Ali Khan,et al.  Probabilistic Analysis for Randomized Game Tree Evaluation , 2004, math/0405322.

[19]  David Wolfe,et al.  An introduction to clobber. , 2005 .

[20]  N. Limnios,et al.  Hoeffding's Inequality for Stopped Martingales and Semi-Markov Processes , 2005 .

[21]  Tao Wang,et al.  Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[22]  Michael C. Fu,et al.  An Adaptive Sampling Algorithm for Solving Markov Decision Processes , 2005, Oper. Res..

[23]  Mark H. M. Winands,et al.  MILA WINS CLOBBER TOURNAMENT , 2005 .

[24]  Jonathan Schaeffer,et al.  Monte Carlo Planning in RTS Games , 2005, CIG.

[25]  Reijer Grimbergen,et al.  Enhancing Search Efficiency by Using Move Categorization Based on Game Progress in Amazons , 2006, ACG.

[26]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .