论文信息 - Better Computer Go Player with Neural Network and Long-term Prediction

Better Computer Go Player with Neural Network and Long-term Prediction

Competing with top human players in the ancient game of Go has been a long-term goal of artificial intelligence. Go's high branching factor makes traditional search techniques ineffective, even on leading-edge hardware, and Go's evaluation function could change drastically with one stone change. Recent works [Maddison et al. (2015); Clark & Storkey (2015)] show that search is not strictly necessary for machine Go players. A pure pattern-matching approach, based on a Deep Convolutional Neural Network (DCNN) that predicts the next move, can perform as well as Monte Carlo Tree Search (MCTS)-based open source Go engines such as Pachi [Baudis & Gailly (2012)] if its search budget is limited. We extend this idea in our bot named darkforest, which relies on a DCNN designed for long-term predictions. Darkforest substantially improves the win rate for pattern-matching approaches against MCTS-based approaches, even with looser search budgets. Against human players, the newest versions, darkfores2, achieve a stable 3d level on KGS Go Server as a ranked bot, a substantial improvement upon the estimated 4k-5k ranks for DCNN reported in Clark & Storkey (2015) based on games against other machine players. Adding MCTS to darkfores2 creates a much stronger player named darkfmcts3: with 5000 rollouts, it beats Pachi with 10k rollouts in all 250 games; with 75k rollouts it achieves a stable 5d level in KGS server, on par with state-of-the-art Go AIs (e.g., Zen, DolBaram, CrazyStone) except for AlphaGo [Silver et al. (2016)]; with 110k rollouts, it won the 3rd place in January KGS Go Tournament.

Yuandong Tian | Yan Zhu

[1] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.

[2] Petr Baudis,et al. PACHI: State of the Art Open Source Go Program , 2011, ACG.

[3] David Silver,et al. Reinforcement learning and simulation-based search in computer go , 2009 .

[4] Marco Platzner,et al. Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning , 2015, ACG.

[5] David Silver,et al. Move Evaluation in Go Using Deep Convolutional Neural Networks , 2014, ICLR.

[6] M. Enzenberger. The Integration of A Priori Knowledge into a Go Playing Neural Network , 1996 .

[7] Amos J. Storkey,et al. Training Deep Convolutional Neural Networks to Play Go , 2015, ICML.

[8] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[10] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Risto Miikkulainen,et al. Evolving Neural Networks to Play Go , 2004, Applied Intelligence.

[13] Martin Müller,et al. Fuego—An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search , 2010, IEEE Transactions on Computational Intelligence and AI in Games.

[14] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[15] Csaba Szepesvári,et al. Bandit Based Monte-Carlo Planning , 2006, ECML.