论文信息 - Building Evaluation Functions for Chess and Shogi with Uniformity Regularization Networks

Building Evaluation Functions for Chess and Shogi with Uniformity Regularization Networks

Building evaluation functions for chess variants is a challenging goal. At this time, only AlphaZero succeeded with millions of self-play records produced by using thousands of tensor processing units (TPUs), which are not available for most researchers. This paper presents the challenge of training evaluation functions on the basis of deep convolutional neural networks using decent data and computing resources, where regularization is crucial as complex models trained with limited data are more prone to overfitting. We present a novel training scheme by introducing a uniformity regularization (UR) network. In the proposed approach, a value network and a discriminator network share common convolutional layers and both networks are trained simultaneously. Loss functions for them are based on the difference between the score of a random move and that of an experts’ move as a comparison training method. The value network is expected to give precise scores for all positions, while the discriminator makes qualitative evaluations for move pairs, and acts as a regularizer that penalizes differences in evaluation results to ensure all samples are uniformly discriminated. Due to the existence of shared layers, such regularization is beneficial for improving the overall accuracy of the value network. Experimental results for chess and shogi demonstrate the proposed method surpassed the standard L2 regularization method, and successfully helped obtain decently accurate value networks.

Tomoyuki Kaneko | Shanchuan Wan

[1] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Gerald Tesauro,et al. Comparison training of chess evaluation functions , 2001 .

[4] Andrew Tridgell,et al. Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[5] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[6] Xiaoou Tang,et al. Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[7] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[8] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Yanjun Qi,et al. Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently , 2017, ArXiv.

[10] Michael Buro,et al. From Simple Features to Sophisticated Evaluation Functions , 1998, Computers and Games.

[11] Nathan S. Netanyahu,et al. DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess , 2016, ICANN.