Alternative Multitask Training for Evaluation Functions in Game of Go

For the game of Go, Chess, and Shogi (Japanese Chess), deep neural networks (DNNs) have contributed to building accurate evaluation functions, and many studies have attempted to create the so-called value network, which predicts the reward of a given state. A recent study of the value network for the game of Go has shown that a two-headed neural network with two different objectives can be trained effectively and performs better than a single-headed network. One of the two heads is called a value head and the other head, the policy head, predicts the next move at a given state. This multitask training makes the network more robust and improves the generalization performance. In this paper, we show that a simple discriminator network is an alternative target of multitask learning. Compared to the existing deep neural network, our proposed network can be designed more easily because of its simple output. Our experimental results showed that our discriminative target also makes the learning stable and the evaluation function trained by our method is comparable to the training of existing studies in terms of predicting the next move and playing strength.