Building Evaluation Functions for Chess and Shogi with Uniformity Regularization Networks

Building evaluation functions for chess variants is a challenging goal. At this time, only AlphaZero succeeded with millions of self-play records produced by using thousands of tensor processing units (TPUs), which are not available for most researchers. This paper presents the challenge of training evaluation functions on the basis of deep convolutional neural networks using decent data and computing resources, where regularization is crucial as complex models trained with limited data are more prone to overfitting. We present a novel training scheme by introducing a uniformity regularization (UR) network. In the proposed approach, a value network and a discriminator network share common convolutional layers and both networks are trained simultaneously. Loss functions for them are based on the difference between the score of a random move and that of an experts’ move as a comparison training method. The value network is expected to give precise scores for all positions, while the discriminator makes qualitative evaluations for move pairs, and acts as a regularizer that penalizes differences in evaluation results to ensure all samples are uniformly discriminated. Due to the existence of shared layers, such regularization is beneficial for improving the overall accuracy of the value network. Experimental results for chess and shogi demonstrate the proposed method surpassed the standard L2 regularization method, and successfully helped obtain decently accurate value networks.

[1]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Gerald Tesauro,et al.  Comparison training of chess evaluation functions , 2001 .

[4]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[7]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[8]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Yanjun Qi,et al.  Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently , 2017, ArXiv.

[10]  Michael Buro,et al.  From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[11]  Nathan S. Netanyahu,et al.  DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess , 2016, ICANN.

[12]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[13]  Tomoyuki Kaneko,et al.  Large-Scale Optimization for Evaluation Functions with Minimax Search , 2014, J. Artif. Intell. Res..

[14]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[15]  Gerald Tesauro,et al.  Connectionist Learning of Expert Preferences by Comparison Training , 1988, NIPS.

[16]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[17]  Matthew Lai,et al.  Giraffe: Using Deep Reinforcement Learning to Play Chess , 2015, ArXiv.

[18]  Lukasz Kaiser,et al.  One Model To Learn Them All , 2017, ArXiv.

[19]  Tomoyuki Kaneko,et al.  Imitation Learning for Playing Shogi Based on Generative Adversarial Networks , 2017, 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI).

[20]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.