论文信息 - Human-level performance in 3D multiplayer games with population-based reinforcement learning - 字舞流文

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Artificial teamwork Artificially intelligent agents are getting better and better at two-player games, but most real-world endeavors require teamwork. Jaderberg et al. designed a computer program that excels at playing the video game Quake III Arena in Capture the Flag mode, where two multiplayer teams compete in capturing the flags of the opposing team. The agents were trained by playing thousands of games, gradually learning successful strategies not unlike those favored by their human counterparts. Computer agents competed successfully against humans even when their reaction times were slowed to match those of humans. Science, this issue p. 859 Teams of artificial agents compete successfully against humans in the video game Quake III Arena in Capture the Flag mode. Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research.

Guy Lever | Joel Z. Leibo | Demis Hassabis | Max Jaderberg | Wojciech Czarnecki | Tim Green | Iain Dunning | Koray Kavukcuoglu | David Silver | Thore Graepel | Ari S. Morcos | Avraham Ruderman | Neil C. Rabinowitz | Nicolas Sonnerat | Charles Beattie | Antonio García Castañeda | Luke Marris | Louise Deason | Wojciech M. Czarnecki | Max Jaderberg | K. Kavukcuoglu | D. Hassabis | D. Silver | Charlie Beattie | Avraham Ruderman | Guy Lever | T. Graepel | Luke Marris | Tim Green | Iain Dunning | A. Castañeda | Nicolas Sonnerat | Louise Deason | David Silver

[1] E. Hellinger,et al. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[2] P. N. Rasmussen. Tjalling C. Koopmans (edt.), Activity Analysis of Production and Allocation. Cowles Commission for Research in Economics, Monograph No. 13. John Wiley & Sons, New York, and Chapman & Hall, London, 1951. 404 sider. $ 4,50. , 1952 .

[3] A. Elo. The rating of chessplayers, past and present , 1978 .

[4] David H. Ackley,et al. Interactions between learning and evolution , 1991 .

[5] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[6] P. Greenfield,et al. Action video games and informal education: Effects on strategies for dividing visual attention , 1994 .

[7] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[8] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[10] Hiroaki Kitano,et al. RoboCup: A Challenge Problem for AI and Robotics , 1997, RoboCup.

[11] Richard K. Belew,et al. New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[12] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[13] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[14] Kagan Tumer,et al. An Introduction to Collective Intelligence , 1999, ArXiv.

[15] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[16] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17] Manuela M. Veloso,et al. Layered Learning , 2000, ECML.

[18] John E. Laird,et al. Human-Level AI's Killer Application: Interactive Computer Games , 2000, AI Mag..

[19] Frans Mäyrä,et al. Fundamental Components of the Gameplay Experience: Analysing Immersion , 2005, DiGRA Conference.

[20] J. Pratt,et al. The effects of action video game experience on the time course of inhibition of return and the efficiency of visual search. , 2005, Acta psychologica.

[21] Jeff Orkin,et al. Three States and a Plan: The A.I. of F.E.A.R. , 2006 .

[22] Peter Stone,et al. Know Thine Enemy: A Champion RoboCup Coach Agent , 2006, AAAI.

[23] Martin A. Riedmiller,et al. On Experiences in a Complex and Competitive Gaming Domain: Reinforcement Learning Meets RoboCup , 2007, 2007 IEEE Symposium on Computational Intelligence and Games.

[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[25] C Shawn Green,et al. Increasing Speed of Processing With Action Video Games , 2009, Current directions in psychological science.

[26] Julian Togelius,et al. Hierarchical controller learning in a First-Person Shooter , 2009, 2009 IEEE Symposium on Computational Intelligence and Games.

[27] Richard L. Lewis,et al. Where Do Rewards Come From , 2009 .

[28] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[29] Richard L. Lewis,et al. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective , 2010, IEEE Transactions on Autonomous Mental Development.

[30] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.

[31] Jeremy R. Cooperstock,et al. On the Limits of the Human Motor Control Precision: The Search for a Device's Human Resolution , 2011, INTERACT.

[32] R. Quiroga. Concept cells: the building blocks of declarative memory functions , 2012, Nature Reviews Neuroscience.

[33] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[34] Sergey Levine,et al. Variational Policy Search via Trajectory Optimization , 2013, NIPS.

[35] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[36] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.

[37] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[38] Juan Carlos Fernández,et al. Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[39] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[40] Yoshua Bengio,et al. A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[41] Ryan P. Adams,et al. Mapping Sub-Second Structure in Mouse Behavior , 2015, Neuron.

[42] David Silver,et al. Reinforced Variational Inference , 2015, NIPS 2015.

[43] C. S. Green,et al. Action video game training for cognitive enhancement , 2015, Current Opinion in Behavioral Sciences.

[44] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[45] C. Honey,et al. Processing Timescales as an Organizing Principle for Primate Cortex , 2015, Neuron.

[46] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[47] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[48] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[49] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[50] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[51] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[52] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[53] Peter Stone,et al. Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[54] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.

[55] Ole Winther,et al. Sequential Neural Models with Stochastic Layers , 2016, NIPS.

[56] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[57] Doina Precup,et al. The Option-Critic Architecture , 2016, AAAI.

[58] Gorjan Alagic,et al. #p , 2019, Quantum information & computation.

[59] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[60] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[61] Kevin Waugh,et al. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker , 2017, Science.

[62] Yuandong Tian,et al. Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[63] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.

[64] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[65] Joël Billieux,et al. Shoot at first sight! First person shooter players display reduced reaction time and compromised inhibitory control in comparison to other video game players , 2017, Comput. Hum. Behav..

[66] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[67] M. A. MacIver,et al. Neuroscience Needs Behavior: Correcting a Reductionist Bias , 2017, Neuron.

[68] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[69] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[70] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[71] Guillaume Lample,et al. Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[72] Patrick MacAlpine,et al. UT Austin Villa: RoboCup 2016 3D Simulation League Competition and Technical Challenges Champions , 2015, Robot Soccer World Cup.

[73] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[74] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[75] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[76] Jakub W. Pachocki,et al. Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[77] P. Alam. ‘T’ , 2021, Composites Engineering: An A–Z Guide.

[78] P. Alam. ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[79] P. Alam. ‘S’ , 2021, Composites Engineering: An A–Z Guide.