论文信息 - Alternative Multitask Training for Evaluation Functions in Game of Go

Alternative Multitask Training for Evaluation Functions in Game of Go

For the game of Go, Chess, and Shogi (Japanese Chess), deep neural networks (DNNs) have contributed to building accurate evaluation functions, and many studies have attempted to create the so-called value network, which predicts the reward of a given state. A recent study of the value network for the game of Go has shown that a two-headed neural network with two different objectives can be trained effectively and performs better than a single-headed network. One of the two heads is called a value head and the other head, the policy head, predicts the next move at a given state. This multitask training makes the network more robust and improves the generalization performance. In this paper, we show that a simple discriminator network is an alternative target of multitask learning. Compared to the existing deep neural network, our proposed network can be designed more easily because of its simple output. Our experimental results showed that our discriminative target also makes the learning stable and the evaluation function trained by our method is comparable to the training of existing studies in terms of predicting the next move and playing strength.

Tomoyuki Kaneko | Yusaku Mandai

[1] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[2] Tomoyuki Kaneko,et al. Building Evaluation Functions for Chess and Shogi with Uniformity Regularization Networks , 2018, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[3] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[4] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[5] Yanjun Qi,et al. Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently , 2017, ArXiv.

[6] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[7] Tomoyuki Kaneko,et al. Imitation Learning for Playing Shogi Based on Generative Adversarial Networks , 2017, 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI).