论文信息 - Learning of Evaluation Functions via Self-Play Enhanced by Checkmate Search

Learning of Evaluation Functions via Self-Play Enhanced by Checkmate Search

As shown in AlphaGo, AlphaGo Zero, and AlphaZero, reinforcement learning is effective in learning of evaluation functions (or value networks) in Go, Chess and Shogi. In their training, two procedures are repeated in parallel; self-play with a current evaluation function and improvement of the evaluation function by using game records yielded by recent self-play. Although AlphaGo, AlphaGo Zero, and AlphaZero have achieved super human performance, the method requires enormous computation resources. To alleviate the problem, this paper proposes to incorporate a checkmate solver in self-play. We show that this small enhancement dramatically improves the efficiency of our experiments in Minishogi, via the quality of game records in self-play. It should be noted that our method is still free from human knowledge about a target domain, though the implementation of checkmate solvers is domain dependent.

Tomoyuki Kaneko | Taichi Nakayashiki

[1] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[2] Hiroyuki Iida,et al. Computer shogi , 2002, Artif. Intell..

[3] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[4] Akihiro Kishimoto,et al. Game-Tree Search Using Proof Numbers: The First Twenty Years , 2012, J. Int. Comput. Games Assoc..

[5] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.