Open-Ended Learning Leads to Generally Capable Agents

Figure 1 | Example zero-shot behaviour of an agent playing a Capture the Flag task at test time. The agent has trained on 700k games, but has never experienced any Capture the Flag games before in training. The red player’s goal is to put both the purple cube (the opponent’s cube) and the black cube (its own cube) onto its base (the grey floor), while the blue player tries to put them on the blue floor – the cubes are used as flags. The red player finds the opponent’s cube, brings it back to its cube at its base, at which point reward is given to the agent. Shortly after, the opponent, played by another agent, tags the red player and takes the cube back.

[1]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[2]  Pierre-Yves Oudeyer,et al.  Language-Conditioned Goal Generation: a New Approach to Language Grounding for RL , 2020, ArXiv.

[3]  Julian Togelius,et al.  Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation , 2018, 1806.10729.

[4]  Andrew Zisserman,et al.  Kickstarting Deep Reinforcement Learning , 2018, ArXiv.

[5]  Razvan Pascanu,et al.  Distilling Policy Distillation , 2019, AISTATS.

[6]  Max Jaderberg,et al.  Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[7]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[8]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[9]  Wojciech Zaremba,et al.  Asymmetric self-play for automatic goal discovery in robotic manipulation , 2021, ArXiv.

[10]  Quoc V. Le,et al.  A graph placement methodology for fast chip design , 2021, Nature.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[13]  Jeff Clune,et al.  Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft , 2021, ArXiv.

[14]  D. Krass,et al.  Percentile performance criteria for limiting average Markov decision processes , 1995, IEEE Trans. Autom. Control..

[15]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[16]  Sergey Levine,et al.  Skew-Fit: State-Covering Self-Supervised Reinforcement Learning , 2019, ICML.

[17]  Jacob Schrum,et al.  CPPN2GAN: combining compositional pattern producing networks and GANs for large-scale pattern generation , 2020, GECCO.

[18]  Vivek S. Borkar,et al.  Risk-constrained Markov decision processes , 2010, 49th IEEE Conference on Decision and Control (CDC).

[19]  Andrew K. Lampinen,et al.  Automated curriculum generation through setter-solver interactions , 2020, ICLR.

[20]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[21]  Thore Graepel,et al.  Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers , 2021, ICML.

[22]  Joel Z. Leibo,et al.  OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning , 2020, ICML.

[23]  Geraud Nangue Tasse,et al.  A Boolean Task Algebra for Reinforcement Learning , 2020, NeurIPS.

[24]  David Silver,et al.  Learning values across many orders of magnitude , 2016, NIPS.

[25]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[26]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[30]  Thore Graepel,et al.  Re-evaluating evaluation , 2018, NeurIPS.

[31]  Joel Lehman,et al.  Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions , 2020, ICML.

[32]  Benjamin Rosman,et al.  Composing Value Functions in Reinforcement Learning , 2019, ICML.

[33]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[34]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[35]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[36]  Frank Nielsen Closed-form information-theoretic divergences for statistical mixtures , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[37]  Julian Togelius,et al.  PCGRL: Procedural Content Generation via Reinforcement Learning , 2020, AAAI 2020.

[38]  Pierre-Yves Oudeyer,et al.  In Search of the Neural Circuits of Intrinsic Motivation , 2007, Front. Neurosci..

[39]  Shie Mannor,et al.  Percentile Optimization for Markov Decision Processes with Parameter Uncertainty , 2010, Oper. Res..

[40]  Sergey Levine,et al.  Visual Reinforcement Learning with Imagined Goals , 2018, NeurIPS.

[41]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[42]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[43]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[44]  Joel Z. Leibo,et al.  Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[45]  Mohammad Ghavamzadeh,et al.  Actor-Critic Algorithms for Risk-Sensitive MDPs , 2013, NIPS.

[46]  Julian Togelius,et al.  An experiment in automatic game design , 2008, 2008 IEEE Symposium On Computational Intelligence and Games.

[47]  Sergey Levine,et al.  Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design , 2020, NeurIPS.

[48]  C. Koch,et al.  Invariant visual representation by single neurons in the human brain , 2005, Nature.

[49]  Julian Togelius,et al.  Procedural Content Generation in Games , 2016, Computational Synthesis and Creative Systems.

[50]  Jürgen Schmidhuber,et al.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem , 2011, Front. Psychol..

[51]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[52]  Joshua B. Tenenbaum,et al.  Learning with AMIGo: Adversarially Motivated Intrinsic Goals , 2020, ICLR.

[53]  Pierre-Yves Oudeyer,et al.  Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environments , 2019, CoRL.

[54]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[55]  Kenneth O. Stanley,et al.  Minimal criterion coevolution: a new approach to open-ended search , 2017, GECCO.

[56]  Shimon Whiteson,et al.  MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[57]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[58]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[59]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[60]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Jeff Clune,et al.  AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence , 2019, ArXiv.

[62]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[63]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[64]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[65]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[66]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[67]  Edward Grefenstette,et al.  Prioritized Level Replay , 2020, ICML.

[68]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[69]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[70]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[71]  Michael H. Bowling,et al.  Convergence Problems of General-Sum Multiagent Reinforcement Learning , 2000, ICML.

[72]  Kenneth O. Stanley,et al.  POET: open-ended coevolution of environments and their optimized solutions , 2019, GECCO.

[73]  Rasmus Berg Palm,et al.  EvoCraft: A New Challenge for Open-Endedness , 2020, EvoApplications.

[74]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2018, AAAI.

[75]  Lei Han,et al.  Curriculum-guided Hindsight Experience Replay , 2019, NeurIPS.

[76]  Emanuel Todorov,et al.  Compositionality of optimal control laws , 2009, NIPS.

[77]  Max Jaderberg,et al.  Real World Games Look Like Spinning Tops , 2020, NeurIPS.

[78]  Pierre-Yves Oudeyer,et al.  CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning , 2018, ICML.

[79]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[80]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[81]  Shie Mannor,et al.  Policy Gradients with Variance Related Risk Criteria , 2012, ICML.

[82]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[83]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[84]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[85]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[86]  Maryam Kamgarpour,et al.  Contextual Games: Multi-Agent Learning with Side Information , 2021, NeurIPS.

[87]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[88]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[89]  H. Francis Song,et al.  V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.

[90]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[91]  Allan Jabri,et al.  Unsupervised Curricula for Visual Meta-Reinforcement Learning , 2019, NeurIPS.

[92]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[93]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[94]  M. Randic,et al.  Resistance distance , 1993 .

[95]  Alec Radford,et al.  Multimodal Neurons in Artificial Neural Networks , 2021 .

[96]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[97]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[98]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[99]  D. Robinson,et al.  The topology of the 2x2 games : a new periodic table , 2005 .

[100]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[101]  Y. Takane,et al.  Generalized Inverse Matrices , 2011 .

[102]  C. Moorehead All rights reserved , 1997 .

[103]  Junhyuk Oh,et al.  Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity , 2021, AAMAS.

[104]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .