Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems

Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games. A cornerstone of this success is the auto-curriculum framework, which shapes the learning process by continually creating new challenging tasks for agents to adapt to, thereby facilitating the acquisition of new skills. In order to extend MARLmethods to realworld domains outside of video games, we envision in this blue sky paper that maintaining a diversity-aware auto-curriculum is critical for successful MARL applications. Specifically, we argue that behavioural diversity is a pivotal, yet under-explored, component for real-world multiagent learning systems, and that significant work remains in understanding how to design a diversity-aware auto-curriculum.We list four open challenges for auto-curriculum techniques, which we believe deserve more attention from this community. Towards validating our vision, we recommend modelling realistic interactive behaviours in autonomous driving as an important test bed, and recommend the SMARTS benchmark.

[1]  T. Reichenbach,et al.  Mobility promotes and jeopardizes biodiversity in rock–paper–scissors games , 2007, Nature.

[2]  Guy Lever,et al.  A Generalized Training Approach for Multiagent Learning , 2020, ICLR.

[3]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[4]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[5]  Dong Chen,et al.  SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving , 2020, ArXiv.

[6]  Sergey Levine,et al.  Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[7]  S. Levine,et al.  Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.

[8]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[9]  Sergey Levine,et al.  RoboNet: Large-Scale Multi-Robot Learning , 2019, CoRL.

[10]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[11]  Sergey Levine,et al.  Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review , 2018, ArXiv.

[12]  Kenneth O. Stanley,et al.  Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[13]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[14]  M. Ghiselin,et al.  Coevolution: Genes, Culture, and Human Diversity , 1991, Politics and the Life Sciences.

[15]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[16]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[17]  Andrew G. Barto,et al.  Intrinsic Motivation and Reinforcement Learning , 2013, Intrinsically Motivated Learning in Natural and Artificial Systems.

[18]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[19]  Nir Levine,et al.  An empirical investigation of the challenges of real-world reinforcement learning , 2020, ArXiv.

[20]  Kenneth O. Stanley,et al.  Exploiting Open-Endedness to Solve Problems Through the Search for Novelty , 2008, ALIFE.

[21]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[22]  Russell K. Standish,et al.  Open-Ended Artificial Evolution , 2002, Int. J. Comput. Intell. Appl..

[23]  Alberto Ferreira de Souza,et al.  Self-Driving Cars: A Survey , 2019, Expert Syst. Appl..

[24]  Yaodong Yang,et al.  Multi-Agent Determinantal Q-Learning , 2020, ICML.

[25]  José Hernández-Orallo,et al.  The Measure of All Minds: Evaluating Natural and Artificial Intelligence , 2017 .

[26]  Yaodong Yang,et al.  An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective , 2020, ArXiv.

[27]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[28]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[29]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[30]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[31]  Olivier Bachem,et al.  Google Research Football: A Novel Reinforcement Learning Environment , 2020, AAAI.

[32]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[33]  Sarah Mathew,et al.  How the Second-Order Free Rider Problem Is Solved in a Small-Scale Society , 2017 .

[34]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[35]  Martin Treiber,et al.  Modeling Lane-Changing Decisions with MOBIL , 2009 .

[36]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[37]  E. Ostrom Collective action and the evolution of social norms , 2000, Journal of Economic Perspectives.

[38]  Asuman E. Ozdaglar,et al.  Flows and Decompositions of Games: Harmonic and Potential Games , 2010, Math. Oper. Res..

[39]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[40]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[41]  Max Jaderberg,et al.  Open-ended Learning in Symmetric Zero-sum Games , 2019, ICML.

[42]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[43]  Dietrich Braess,et al.  Über ein Paradoxon aus der Verkehrsplanung , 1968, Unternehmensforschung.

[44]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[45]  Krzysztof Choromanski,et al.  Effective Diversity in Population-Based Reinforcement Learning , 2020, NeurIPS.

[46]  Simon M. Lucas,et al.  Coevolving Game-Playing Agents: Measuring Performance and Intransitivities , 2013, IEEE Transactions on Evolutionary Computation.

[47]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[48]  Yonggang Wen,et al.  Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning , 2017, IEEE Transactions on Cybernetics.

[49]  Marco Mirolli,et al.  Intrinsically Motivated Learning in Natural and Artificial Systems , 2013 .

[50]  Jan Paredis,et al.  Coevolutionary Computation , 1995, Artificial Life.

[51]  Thomas Bäck,et al.  Evolutionary computation: Toward a new philosophy of machine intelligence , 1997, Complex..

[52]  Masayoshi Tomizuka,et al.  INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps , 2019, ArXiv.

[53]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[54]  Joel Z. Leibo,et al.  Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[55]  Susan Stepney,et al.  Defining and simulating open-ended novelty: requirements, guidelines, and challenges , 2016, Theory in Biosciences.

[56]  Petter N. Kolm,et al.  Modern Perspectives on Reinforcement Learning in Finance , 2019, SSRN Electronic Journal.

[57]  Kenneth O. Stanley,et al.  Quality Diversity: A New Frontier for Evolutionary Computation , 2016, Front. Robot. AI.

[58]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[59]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[60]  M. Asada,et al.  SimSpark – Concepts and Application in the RoboCup 3 D Soccer Simulation League , 2008 .

[61]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[62]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[63]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[64]  Andrew J. Davison,et al.  Sim-to-Real Reinforcement Learning for Deformable Object Manipulation , 2018, CoRL.

[65]  M. Rausher Co-evolution and plant resistance to natural enemies , 2001, Nature.

[66]  Gabriel Dulac-Arnold,et al.  Challenges of Real-World Reinforcement Learning , 2019, ArXiv.

[67]  M. Gilpin Limit Cycles in Competition Communities , 1975, The American Naturalist.

[68]  Kenneth O. Stanley,et al.  Evolving a diversity of virtual creatures through novelty search and local competition , 2011, GECCO '11.

[69]  M. Elsayed ULTRA: A reinforcement learning generalization benchmark for autonomous driving , 2020 .

[70]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[71]  Helbing,et al.  Congested traffic states in empirical observations and microscopic simulations , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[72]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[73]  M. Feldman,et al.  Local dispersal promotes biodiversity in a real-life game of rock–paper–scissors , 2002, Nature.

[74]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[75]  Jiechao Xiong,et al.  TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game , 2020, ArXiv.

[76]  Qiang Fu,et al.  Towards Playing Full MOBA Games with Deep Reinforcement Learning , 2020, NeurIPS.

[77]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[78]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.