论文信息 - Exploration exploitation in Go: UCT for Monte-Carlo Go

Exploration exploitation in Go: UCT for Monte-Carlo Go

Algorithm UCB1 for multi-armed bandit problem has already been extended to Algorithm UCT which works for minimax tree search. We have developed a Monte-Carlo program, MoGo, which is the ﬁrst computer Go program using UCT. We explain our modiﬁcations of UCT for Go application, among which efficient memory management, parametrization, ordering of non-visited nodes and parallelization. MoGo is now a top-level Computer-Go program on 9 x 9 Go board.

Sylvain Gelly | Yizao Wang | S. Gelly | Yizao Wang

[1] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .

[2] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[3] Bruno Bouzy,et al. Computer Go: An AI oriented survey , 2001, Artif. Intell..

[4] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[5] Akihiro Kishimoto,et al. A General Solution to the Graph History Interaction Problem , 2004, AAAI.

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] Jan Willemson,et al. Improved Monte-Carlo Search , 2006 .

[8] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[9] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.