论文信息 - Dota 2 with Large Scale Deep Reinforcement Learning - 字舞流文

Dota 2 with Large Scale Deep Reinforcement Learning

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

Jakub W. Pachocki | Henrique Pondé de Oliveira Pinto | F. Wolski | Ilya Sutskever | Tim Salimans | Christopher Hesse | Scott Gray | R. Józefowicz | Greg Brockman | Vicki Cheung | Jonas Schneider | Jie Tang | Jonathan Raiman | Christopher Berner | Brooke Chan | Przemyslaw Debiak | Christy Dennison | David Farhi | Quirin Fischer | Shariq Hashme | Catherine Olsson | Michael Petrov | Jeremy Schlatter | Szymon Sidor | Susan Zhang | J. Pachocki | S. Gray | I. Sutskever | Filip Wolski

[1] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[2] Jing Peng,et al. An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[3] L. V. Allis,et al. Searching for solutions in games and artificial intelligence , 1994 .

[4] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[5] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[6] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[7] Jürgen Schmidhuber,et al. Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[8] Thomas Hofmann,et al. TrueSkill™: A Bayesian Skill Rating System , 2007 .

[9] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[12] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[13] Aditya Jain,et al. A comparative study of visual and auditory reaction times on the basis of gender and physical activity levels of medical first year students , 2015, International journal of applied & basic medical research.

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Tianqi Chen,et al. Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[16] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[19] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[20] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[21] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[22] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[23] Martial Hebert,et al. Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] David Barber,et al. Thinking Fast and Slow with Deep Learning and Tree Search , 2017, NIPS.

[25] Diederik P. Kingma,et al. GPU Kernels for Block-Sparse Weights , 2017 .

[26] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[27] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[28] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[29] Yang You,et al. Scaling SGD Batch Size to 32K for ImageNet Training , 2017, ArXiv.

[30] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[31] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[32] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[33] Ilya Kostrikov,et al. Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[34] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Joel Z. Leibo,et al. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[36] Yee Whye Teh,et al. Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.

[37] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[38] Dario Amodei,et al. An Empirical Model of Large-Batch Training , 2018, ArXiv.

[39] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[40] Max Jaderberg,et al. Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[41] Marcin Andrychowicz,et al. Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[42] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[43] Amos J. Storkey,et al. Exploration by Random Network Distillation , 2018, ICLR.

[44] Katja Hofmann,et al. The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors , 2019, ArXiv.

[45] Guy Lever,et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[46] Taehoon Kim,et al. Quantifying Generalization in Reinforcement Learning , 2018, ICML.