论文信息 - Modification of UCT with Patterns in Monte-Carlo Go

Modification of UCT with Patterns in Monte-Carlo Go

Algorithm UCB1 for multi-armed bandit problem has already been extended to Algorithm UCT (Upper bound Confidence for Tree) which works for minimax tree search. We have developed a Monte-Carlo Go program, MoGo, which is the first computer Go program using UCT. We explain our modification of UCT for Go application and also the intelligent random simulation with patterns which has improved significantly the performance of MoGo. UCT combined with pruning techniques for large Go board is discussed, as well as parallelization of UCT. MoGo is now a top level Go program on $9\times9$ and $13\times13$ Go boards.

[1] Bernd Brügmann Max-Planck. Monte Carlo Go , 1993 .

[2] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[3] Tristan Cazenave,et al. Abstract Proof Search , 2000, Computers and Games.

[4] Thore Graepel,et al. Learning on Graphs in the Game of Go , 2001, ICANN.

[5] Bruno Bouzy,et al. Computer Go: An AI oriented survey , 2001, Artif. Intell..

[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7] Akihiro Kishimoto,et al. A General Solution to the Graph History Interaction Problem , 2004, AAAI.

[8] Bruno Bouzy,et al. Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[9] Bruno Bouzy,et al. Bayesian Generation and Integration of K-nearest-neighbor Patterns for 19x19 Go , 2005, CIG.

[10] Lin Wu,et al. SVM and pattern-enriched common fate graphs for the game of go , 2005, ESANN.

[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[12] Tristan Cazenave,et al. Combining Tactical Search and Monte-Carlo in the Game of Go , 2005, CIG.

[13] Jan Willemson,et al. Improved Monte-Carlo Search , 2006 .

[14] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[15] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.