Computing "Elo Ratings" of Move Patterns in the Game of Go

Move patterns are an essential method to incorporate do- main knowledge into Go-playing programs. This paper presents a new Bayesian technique for supervised learning of such patterns from game records, based on a generalization of Elo ratings. Each sample move in the training data is considered as a victory of a team of pattern features. Elo ratings of individual pattern features are computed from these victo- ries, and can be used in previously unseen positions to compute a prob- ability distribution over legal moves. In this approach, several pattern features may be combined, without an exponential cost in the number of features. Despite a very small number of training games (652), this algorithm outperforms most previous pattern-learning algorithms, both in terms of mean log-evidence ( 2.69), and prediction rate (34.9%). A 19◊ 19 Monte-Carlo program improved with these patterns reached the level of the strongest classical programs. and little domain expertise. This paper presents a new supervised pattern-learning algorithm, based on the Bradley-Terry model. The Bradley-Terry model is the theoretical basis of the Elo rating system. The principle of Elo ratings, as applied to chess, is that each player gets a numerical strength estimation, computed from the observation of past game results. From the ratings of players, it is possible to estimate a probability distribution over the outcome of future games. The same principle

[1]  D. Hunter MM algorithms for generalized Bradley-Terry models , 2003 .

[2]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[3]  Herbert D. Enderton The Golem Go Program , 1991 .

[4]  Thore Graepel,et al.  Bayesian pattern ranking for move prediction in the game of Go , 2006, ICML.

[5]  Rémi Coulom,et al.  Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[6]  Tristan Cazenave Iterative Widening , 2001, IJCAI.

[7]  Eric O. Postma,et al.  Local Move Prediction in Go , 2002, Computers and Games.

[8]  H. Jaap van den Herik,et al.  Progressive Strategies for Monte-Carlo Tree Search , 2008 .

[9]  A. Elo The rating of chessplayers, past and present , 1978 .

[10]  Jun'ichi Tsujii,et al.  Move Prediction in Go with the Maximum Entropy Method , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[11]  Fredrik A. Dahl,et al.  Honte, a go-playing program using neural nets , 2001 .

[12]  Bruno Bouzy,et al.  Associating domain-dependent knowledge and Monte Carlo approaches within a Go program , 2005, Inf. Sci..

[13]  Bruno Bouzy,et al.  Bayesian Generation and Integration of K-nearest-neighbor Patterns for 19x19 Go , 2005, CIG.

[14]  Ken Chen,et al.  Machine Learning, Game Play, and Go , 1998 .

[15]  Olivier Teytaud,et al.  Modification of UCT with Patterns in Monte-Carlo Go , 2006 .

[16]  Bruno Bouzy,et al.  HISTORY AND TERRITORY HEURISTICS FOR MONTE CARLO GO , 2006 .