The Last-Good-Reply Policy for Monte-Carlo Go

The dominant paradigm for computer-Go players is Monte-Carlo Tree Search (MCTS). This algorithm builds a search tree by playing many simulated games (playouts). Each playout consists of a sequence of moves within the tree followed by many moves beyond the tree. Moves beyond the tree are generated by a biased random sampling policy. This note presents a dynamic sampling policy that takes advantage of information from previous playouts. The policy makes moves that, in previous playouts, have been successful replies to immediately preceding moves. Experimental results show that the policy provides a large improvement in playing strength.

[1]  Jonathan Schaeffer,et al.  The History Heuristic and Alpha-Beta Search Enhancements in Practice , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Rémi Coulom,et al.  Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Y. Björnsson,et al.  Simulation Control in General Game Playing Agents , 2009 .

[7]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[8]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[9]  Bruno Bouzy,et al.  Monte-Carlo Go Reinforcement Learning Experiments , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[10]  Donald C. Wunsch,et al.  Computer Go: A Grand Challenge to AI , 2007, Challenges for Computational Intelligence.

[11]  David Silver,et al.  Combining online and offline knowledge in UCT , 2007, ICML '07.

[12]  Olivier Teytaud,et al.  Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search , 2009, ACG.

[13]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[14]  Gerald Tesauro,et al.  Monte-Carlo simulation balancing , 2009, ICML '09.