A Study of Artificial Neural Network Architectures for Othello Evaluation Functions

In this study, we use temporal difference learning (TDL) to investigate the ability of 20 different artificial neural network (ANN) architectures to learn othello game board evaluation functions. The ANN evaluation functions are applied to create a strong othello player using only 1-ply search. In addition to comparing many of the ANN architectures seen in the literature, we introduce several new architectures that consider the game board symmetry. Both embedding the game board symmetry into the network architecture through weight sharing and the outright removal of symmetry through symmetry removal are explored. Experiments varying the number of inputs per game board square from one to three, the number of hidden nodes, and number of hidden layers are also performed. We found it advantageous to consider game board symmetry in the form of symmetry by weight sharing; and that an input encoding of three inputs per square outperformed the one input per square encoding that is commonly seen in the literature. Furthermore, architectures with only one hidden layer were strongly outperformed by architectures with multiple hidden layers. A standard weighted-square board heuristic evaluation function from the literature was used to evaluate the quality of the trained ANN othello players. One of the ANN architectures introduced in this study, an ANN implementing weight sharing and consisting of three hidden layers, using only a 1-ply search, outperformed a weighted-square test heuristic player using a 6-ply minimax search.

[1]  Gerald Tesauro,et al.  Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..

[2]  Andrew Tridgell,et al.  KnightCap: A Chess Programm That Learns by Combining TD(lambda) with Game-Tree Search , 1998, ICML.

[3]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[4]  Shin Ishii,et al.  Strategy Acquisition for the Game "Othello" Based on Reinforcement Learning , 1999, ICONIP.

[5]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  David B. Fogel,et al.  Evolving an expert checkers playing program without using human expertise , 2001, IEEE Trans. Evol. Comput..

[8]  Andreas Junghanns,et al.  Diminishing Returns for Additional Search in Chess , 2007 .

[9]  Michael Buro,et al.  Improving heuristic mini-max search by supervised learning , 2002, Artif. Intell..

[10]  Simon M. Lucas,et al.  Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go , 2005, IEEE Transactions on Evolutionary Computation.

[11]  Jonathan Schaeffer,et al.  Temporal Difference Learning Applied to a High-Performance Game-Playing Program , 2001, IJCAI.

[12]  S.M. Lucas,et al.  Evolutionary computation and games , 2006, IEEE Computational Intelligence Magazine.

[13]  David B. Fogel,et al.  Evolving a checkers player without relying on human experience , 2000, INTL.

[14]  P. E. Utgoff,et al.  What a Neural Network Can Learn About Othello , 1996 .

[15]  Joshua A. Singer Co-evolving a Neural-Net Evaluation Function for Othello by Combining Genetic Algorithms and Reinforcement Learning , 2001, International Conference on Computational Science.

[16]  Jonathan Schaeffer,et al.  The games computers (and people) play , 2000, Adv. Comput..

[17]  Helmut A. Mayer,et al.  Coevolution of neural Go players in a cultural environment , 2005, 2005 IEEE Congress on Evolutionary Computation.

[18]  Siang Yew Chong,et al.  Observing the evolution of neural networks learning to play the game of Othello , 2005, IEEE Transactions on Evolutionary Computation.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[21]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[22]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[23]  Michael Buro,et al.  From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[24]  Terrence J. Sejnowski,et al.  Learning to evaluate Go positions via temporal difference methods , 2001 .