Reinforcement learning and the effects of parameter settings in the game of Chung Toi

This work applied reinforcement learning and the temporal difference TD(λ) algorithm to train a neural network to play the game of Chung Toi, a challenging variant of Tic-Tac-Toe. The effects of changing parameters and settings of the TD(λ) and of the neural network were evaluated by observing the ability of the network to learn the game of Chung Toi and play against a ‘smart’ random player. This work applied techniques that have proven effective in training neural networks in general to the TD(λ) algorithm. The basic implementation of the TD(λ) method resulted in stable performance and achieved a maximal performance of winning 90.4% of evaluation games. When changing parameter settings, the best performance was achieved by using different learning rates between layers in the neural network (92.6% wins), and this was followed by using a relatively high probability of action exploitation (91.8% wins).

[1]  Jonathan D. Linton,et al.  An augmented efficient backpropagation training strategy for deep autoassociative neural networks , 2010, ESANN.

[2]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[3]  Gerald Tesauro,et al.  Programming backgammon using self-teaching neural nets , 2002, Artif. Intell..

[4]  G. Tesauro Practical Issues in Temporal Difference Learning , 1992 .

[5]  Marco Wiering Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning , 2010, J. Intell. Learn. Syst. Appl..

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Masafumi Hagiwara,et al.  A Study of Artificial Neural Network Architectures for Othello Evaluation Functions , 2007 .

[8]  Denise Gorse,et al.  Application of stochastic recurrent reinforcement learning to index trading , 2011, ESANN.

[9]  R. Sutton,et al.  Reinforcement learning in board games , 2004 .

[10]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[11]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[12]  Christopher J. Gatti,et al.  A brief tutorial on reinforcement learning: The game of Chung Toi , 2011, ESANN.

[13]  M. A. Wiering TD Learning of Game Evaluation Functions with Hierarchies Neural Architectures , 1995 .

[14]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[15]  Marco Wiering,et al.  Learning to Play Board Games using Temporal Dierence Methods , 2005 .