Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge

Computer games have been regarded as an important field of artificial intelligence (AI) for a long time. The AlphaZero structure has been successful in the game of Go, beating the top professional human players and becoming the baseline method in computer games. However, the AlphaZero training process requires tremendous computing resources, imposing additional difficulties for the AlphaZero-based AI. In this paper, we propose NoGoZero+ to improve the AlphaZero process and apply it to a game similar to Go, NoGo. NoGoZero+ employs several innovative features to improve training speed and performance, and most improvement strategies can be transferred to other nonspecific areas. This paper compares it with the original AlphaZero process, and results show that NoGoZero+ increases the training speed to about six times that of the original AlphaZero process. Moreover, in the experiment, our agent beat the original AlphaZero agent with a score of 81:19 after only being trained by 20,000 self-play games’ data (small in quantity compared with 120,000 self-play games’ data consumed by the original AlphaZero). The NoGo game program based on NoGoZero+ was the runner-up in the 2020 China Computer Game Championship (CCGC) with limited resources, defeating many AlphaZero-based programs. Our code, pretrained models, and self-play datasets are publicly available. The ultimate goal of this paper is to provide exploratory insights and mature auxiliary tools to enable AI researchers and computer-game communities to study, test, and improve these promising state-of-the-art methods at a much lower cost of computing resources.

[1]  Yuxia Sun,et al.  Research on static evaluation method for computer game of NoGo , 2014, The 26th Chinese Control and Decision Conference (2014 CCDC).

[2]  Fei Li,et al.  Pattern matching and Monte-Carlo simulation mechanism for the game of NoGo , 2012, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems.

[3]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[4]  Christopher D. Rosin,et al.  Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[5]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[6]  David J. Wu,et al.  Accelerating Self-Play Learning in Go , 2019, ArXiv.

[7]  Francesco Morandin,et al.  SAI a Sensible Artificial Intelligence that plays Go , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[8]  Hani Hagras,et al.  Genetic fuzzy markup language for game of NoGo , 2012, Knowl. Based Syst..

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.