Mimicking Go Experts with Convolutional Neural Networks

Building a strong computer Go player is a longstanding open problem. In this paper we consider the related problem of predicting the moves made by Go experts in professional games. The ability to predict experts' moves is useful, because it can, in principle, be used to narrow the search done by a computer Go player. We applied an ensemble of convolutional neural networks to this problem. Our main result is that the ensemble learns to predict 36.9% of the moves made in test expert Go games, improving upon the state of the art, and that the best single convolutional neural network of the ensemble achieves 34% accuracy. This network has less than 104parameters.

[1]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[2]  Bernd Brügmann Max-Planck Monte Carlo Go , 1993 .

[3]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[4]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[7]  Martin Müller Review: Computer Go 1984-2000 , 2000, Computers and Games.

[8]  Erik van der Werf,et al.  AI techniques for the game of Go , 2001 .

[9]  Bruno Bouzy,et al.  Computer Go: An AI oriented survey , 2001, Artif. Intell..

[10]  Eric O. Postma,et al.  Local Move Prediction in Go , 2002, Computers and Games.

[11]  Markus Enzenberger,et al.  Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.

[12]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[13]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16]  Thore Graepel,et al.  Bayesian pattern ranking for move prediction in the game of Go , 2006, ICML.

[17]  Lin Wu,et al.  A Scalable Machine Learning Approach to Go , 2006, NIPS.

[18]  Sylvain Gelly,et al.  Exploration exploitation in Go: UCT for Monte-Carlo Go , 2006, NIPS 2006.

[19]  Luigi Barone Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Games, CIG 2007, Honolulu, Hawaii, USA, 1-5 April, 2007 , 2007, CIG.

[20]  Marc'Aurelio Ranzato,et al.  A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[21]  Jun'ichi Tsujii,et al.  Move Prediction in Go with the Maximum Entropy Method , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[22]  Jonathan Schaeffer,et al.  Checkers Is Solved , 2007, Science.

[23]  Zongmin Ma,et al.  Computers and Games , 2008, Lecture Notes in Computer Science.