论文信息 - Grandmaster level in StarCraft II using multi-agent reinforcement learning

Grandmaster level in StarCraft II using multi-agent reinforcement learning

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions1–3, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems4. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks5,6. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.AlphaStar uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

[1] A. Elo. The rating of chessplayers, past and present , 1978 .

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[4] Michael Buro. ORTS: A Hack-Free RTS Game Environment , 2002, Computers and Games.

[5] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[6] Michael Buro,et al. Real-Time Strategy Games: A New AI Research Challenge , 2003, IJCAI.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[9] David S. Leslie,et al. Generalised weakened fictitious play , 2006, Games Econ. Behav..

[10] Chuen-Tsai Sun,et al. Building a player strategy model by analyzing replays of real-time strategy games , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[11] Michael Mateas,et al. Case-Based Reasoning for Build Order in Real-Time Strategy Games , 2009, AIIDE.

[12] Ben George Weber. AIIDE 2010 StarCraft Competition , 2010, AIIDE.

[13] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[14] Pierre Bessière,et al. A Bayesian Model for Plan Recognition in RTS Games Applied to StarCraft , 2011, AIIDE.

[15] Gabriel Synnaeve,et al. A Bayesian model for opening prediction in RTS games with application to StarCraft , 2011, 2011 IEEE Conference on Computational Intelligence and Games (CIG'11).

[16] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[17] David Silver,et al. Fictitious Self-Play in Extensive-Form Games , 2015, ICML.

[18] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[19] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[22] Santiago Ontañón,et al. Improving Monte Carlo Tree Search Policies in StarCraft via Probabilistic Models Learned from Replay Data , 2021, AIIDE.

[23] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[24] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Kyung-Joong Kim,et al. StarCraft AI Competition Report , 2016, AI Mag..

[26] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[27] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[28] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[29] Navdeep Jaitly,et al. Discrete Sequential Prediction of Continuous Actions for Deep RL , 2017, ArXiv.

[30] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[31] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[33] David Churchill,et al. An Analysis of Model-Based Heuristic Search Techniques for StarCraft Combat Scenarios , 2017, AIIDE Workshops.

[34] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[35] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[36] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[37] Sebastian Risi,et al. Learning macromanagement in starcraft from replays using deep learning , 2017, 2017 IEEE Conference on Computational Intelligence and Games (CIG).

[38] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[39] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[40] Bo Li,et al. TStarBots: Defeating the Cheating Level Builtin AI in StarCraft II in the Full Game , 2018, ArXiv.

[41] Nicolas Usunier,et al. Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger , 2018, NeurIPS.

[42] Shane Legg,et al. Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[43] Razvan Pascanu,et al. Relational Deep Reinforcement Learning , 2018, ArXiv.

[44] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[45] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[46] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[47] Satinder Singh,et al. Self-Imitation Learning , 2018, ICML.

[48] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[49] Dongbin Zhao,et al. StarCraft Micromanagement With Reinforcement Learning and Curriculum Transfer Learning , 2018, IEEE Transactions on Emerging Topics in Computational Intelligence.

[50] Max Jaderberg,et al. Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[51] Shimon Whiteson,et al. The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[52] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.