论文信息 - The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go

The Power of Forgetting: Improving the Last-Good-Reply Policy in Monte Carlo Go

The dominant paradigm for programs playing the game of Go is Monte Carlo tree search. This algorithm builds a search tree by playing many simulated games (playouts). Each playout consists of a sequence of moves within the tree followed by many moves beyond the tree. Moves beyond the tree are generated by a biased random sampling policy. The recently published last-good-reply policy makes moves that, in previous playouts, have been successful replies to immediately preceding moves. This paper presents a modification of this policy that not only remembers moves that recently succeeded but also immediately forgets moves that recently failed. This modification provides a large improvement in playing strength. We also show that responding to the previous two moves is superior to responding to the previous one move. Surprisingly, remembering the win rate of every reply performs much worse than simply remembering the last good reply (and indeed worse than not storing good replies at all).

Hendrik Baier | P. D. Drake | P. Drake | Hendrik Baier

[1] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .

[2] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[3] C. Thain. Way to go. , 2000, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[4] Martin Müller,et al. Computer Go , 2002, Artif. Intell..

[5] Peter Drake,et al. Data structures and algorithms in Java , 2005 .

[6] Bruno Bouzy,et al. Monte-Carlo Go Reinforcement Learning Experiments , 2006, 2006 IEEE Symposium on Computational Intelligence and Games.

[7] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[8] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[9] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[10] David Silver,et al. Combining online and offline knowledge in UCT , 2007, ICML '07.

[11] Rémi Coulom,et al. Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..

[12] Donald C. Wunsch,et al. Computer Go: A Grand Challenge to AI , 2007, Challenges for Computational Intelligence.

[13] Y. Björnsson,et al. Simulation Control in General Game Playing Agents , 2009 .

[14] Peter Drake. The Last-Good-Reply Policy for Monte-Carlo Go , 2009, J. Int. Comput. Games Assoc..

[15] Olivier Teytaud,et al. Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search , 2009, ACG.

[16] Fabien Teytaud,et al. Multiple Overlapping Tiles for Contextual Monte Carlo Tree Search , 2010, EvoApplications.

[17] Hendrik Baier,et al. Adaptive Playout Policies for Monte-Carlo Go , 2010 .