论文信息 - Computing "Elo Ratings" of Move Patterns in the Game of Go

Computing "Elo Ratings" of Move Patterns in the Game of Go

Move patterns are an essential method to incorporate do- main knowledge into Go-playing programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a victory of a team of pattern features. Elo ratings of individual pattern features are computed from these victo- ries, and can be used in previously unseen positions to compute a prob- ability distribution over legal moves. In this approach, several pattern features may be combined, without an exponential cost in the number of features. Despite a very small number of training games (652), this algorithm outperforms most previous pattern-learning algorithms, both in terms of mean log-evidence ( 2.69), and prediction rate (34.9%). A 19◊ 19 Monte-Carlo program improved with these patterns reached the level of the strongest classical programs. and little domain expertise. This paper presents a new supervised pattern-learning algorithm, based on the Bradley-Terry model. The Bradley-Terry model is the theoretical basis of the Elo rating system. The principle of Elo ratings, as applied to chess, is that each player gets a numerical strength estimation, computed from the observation of past game results. From the ratings of players, it is possible to estimate a probability distribution over the outcome of future games. The same principle

Rémi Coulom | Rémi Coulom

[1] D. Hunter. MM algorithms for generalized Bradley-Terry models , 2003 .

[2] Tom Minka,et al. TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[3] Herbert D. Enderton. The Golem Go Program , 1991 .

[4] Thore Graepel,et al. Bayesian pattern ranking for move prediction in the game of Go , 2006, ICML.

[5] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[6] Tristan Cazenave. Iterative Widening , 2001, IJCAI.

[7] Eric O. Postma,et al. Local Move Prediction in Go , 2002, Computers and Games.

[8] H. Jaap van den Herik,et al. Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[9] A. Elo. The rating of chessplayers, past and present , 1978 .

[10] Jun'ichi Tsujii,et al. Move Prediction in Go with the Maximum Entropy Method , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[11] Fredrik A. Dahl,et al. Honte, a go-playing program using neural nets , 2001 .

[12] Bruno Bouzy,et al. Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[13] Bruno Bouzy,et al. Bayesian Generation and Integration of K-nearest-neighbor Patterns for 19x19 Go , 2005, CIG.

[14] Ken Chen,et al. Machine Learning, Game Play, and Go , 1998 .

[15] Olivier Teytaud,et al. Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[16] Bruno Bouzy,et al. HISTORY AND TERRITORY HEURISTICS FOR MONTE CARLO GO , 2006 .