Reinforcement Learning of Local Shape in the Game of Go

We explore an application to the game of Go of a reinforcement learning approach based on a linear evaluation function and large numbers of binary features. This strategy has proved effective in game playing programs and other reinforcement learning applications. We apply this strategy to Go by creating over a million features based on templates for small fragments of the board, and then use temporal difference learning and self-play. This method identifies hundreds of low level shapes with recognisable significance to expert Go players, and provides quantitive estimates of their values. We analyse the relative contributions to performance of templates of different types and sizes. Our results show that small, translation-invariant templates are surprisingly effective. We assess the performance of our program by playing against the Average Liberty Player and a variety of computer opponents on the 9×9 Computer Go Server. Our linear evaluation function appears to outperform all other static evaluation functions that do not incorporate substantial domain knowledge.

[1]  Albert L. Zobrist,et al.  A New Hashing Method with Application for Game Playing , 1990 .

[2]  Jonathan Schaeffer,et al.  A World Championship Caliber Checkers Program , 1992, Artif. Intell..

[3]  Terrence J. Sejnowski,et al.  Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[4]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[5]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[6]  Ken Chen,et al.  Machine Learning, Game Play, and Go , 1998 .

[7]  Michael Buro,et al.  From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[8]  Andrew Tridgell,et al.  Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..

[9]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[10]  Martin Müller,et al.  Computer Go , 2002, Artif. Intell..

[11]  Brian Sheppard,et al.  World-championship-caliber Scrabble , 2002, Artif. Intell..

[12]  Eric O. Postma,et al.  Local Move Prediction in Go , 2002, Computers and Games.

[13]  Markus Enzenberger,et al.  Evaluation in Go by a Neural Network using Soft Segmentation , 2003, ACG.

[14]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[15]  Nathan R. Sturtevant,et al.  Feature Construction for Reinforcement Learning in Hearts , 2006, Computers and Games.