论文信息 - ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

ELF OpenGo: An Analysis and Open Reimplementation of AlphaZero

The AlphaGo, AlphaGo Zero, and AlphaZero series of algorithms are remarkable demonstrations of deep reinforcement learning's capabilities, achieving superhuman performance in the complex game of Go with progressively increasing autonomy. However, many obstacles remain in the understanding of and usability of these promising approaches by the research community. Toward elucidating unresolved mysteries and facilitating future research, we propose ELF OpenGo, an open-source reimplementation of the AlphaZero algorithm. ELF OpenGo is the first open-source Go AI to convincingly demonstrate superhuman performance with a perfect (20:0) record against global top professionals. We apply ELF OpenGo to conduct extensive ablation studies, and to identify and analyze numerous interesting phenomena in both the model training and in the gameplay inference procedures. Our code, models, selfplay datasets, and auxiliary data are publicly available.

[1] Donald E. Knuth,et al. An Analysis of Alpha-Beta Pruning , 1975, Artif. Intell..

[2] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[3] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.

[4] H. Jaap van den Herik,et al. Parallel Monte-Carlo Tree Search , 2008, Computers and Games.

[5] Martin Müller,et al. Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[6] Christopher D. Rosin,et al. Multi-armed bandits with episode context , 2011, Annals of Mathematics and Artificial Intelligence.

[7] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[8] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[9] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[10] Amos J. Storkey,et al. Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.

[11] Yuandong Tian,et al. Better Computer Go Player with Neural Network and Long-term Prediction , 2016, ICLR.

[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14] Yuandong Tian,et al. ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[15] Ling Zhou,et al. Demystifying AlphaGo Zero as AlphaGo GAN , 2017, ArXiv.

[16] Kunle Olukotun,et al. DAWNBench : An End-to-End Deep Learning Benchmark and Competition , 2017 .

[17] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[18] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[19] Yiyang Zhao,et al. AlphaX: eXploring Neural Architectures with Deep Neural Networks and Monte Carlo Tree Search , 2019, ArXiv.

[20] M. Alizadeh,et al. Understanding & Generalizing AlphaGo Zero , 2018 .

[21] Fei Wang,et al. From Gameplay to Symbolic Reasoning: Learning SAT Solver Heuristics in the Style of Alpha(Go) Zero , 2018, ArXiv.

[22] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.