Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research

Evolution has produced a multi-scale mosaic of interacting adaptive units. Innovations arise when perturbations push parts of the system away from stable equilibria into new regimes where previously well-adapted solutions no longer work. Here we explore the hypothesis that multi-agent systems sometimes display intrinsic dynamics arising from competition and cooperation that provide a naturally emergent curriculum, which we term an autocurriculum. The solution of one social task often begets new social tasks, continually generating novel challenges, and thereby promoting innovation. Under certain conditions these challenges may become increasingly complex over time, demanding that agents accumulate ever more innovations.

[1]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[2]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[3]  Toshio Yamagishi Seriousness of Social Dilemmas and the Provision of a Sanctioning System , 1988 .

[4]  Shane Legg,et al.  Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents , 2018, ArXiv.

[5]  E. Ostrom Understanding Institutional Diversity , 2005 .

[6]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[7]  Bettina Rockenbach,et al.  The efficient interaction of indirect reciprocity and costly punishment , 2006, Nature.

[8]  Hanna Kokko,et al.  The tragedy of the commons in evolutionary biology. , 2007, Trends in ecology & evolution.

[9]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[10]  K. Laland,et al.  Identification of the Social and Cognitive Processes Underlying Human Cumulative Culture , 2012, Science.

[11]  E. Wilson,et al.  Rethinking the Theoretical Foundation of Sociobiology , 2007, The Quarterly Review of Biology.

[12]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[13]  José Hernández-Orallo The Measure of All Minds: Evaluating Natural and Artificial Intelligence , 2017 .

[14]  L. Cantley,et al.  Understanding the Warburg Effect: The Metabolic Requirements of Cell Proliferation , 2009, Science.

[15]  Institutions and the Path to the Modern Economy: Lessons from Medieval Trade (an Excerpt) , 2006 .

[16]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[17]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[18]  J. Henrich,et al.  The cultural niche: Why social learning is essential for human adaptation , 2011, Proceedings of the National Academy of Sciences.

[19]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2018, ICLR.

[20]  Simon M. Lucas,et al.  Coevolving Game-Playing Agents: Measuring Performance and Intransitivities , 2013, IEEE Transactions on Evolutionary Computation.

[21]  Tilman Börgers,et al.  Learning Through Reinforcement and Replicator Dynamics , 1997 .

[22]  M. Nowak,et al.  The evolution of eusociality , 2010, Nature.

[23]  Sanmit Narvekar Curriculum Learning in Reinforcement Learning , 2017, IJCAI.

[24]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[25]  E. Ostrom,et al.  Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[26]  Michael L. Littman,et al.  Potential-based Shaping in Model-based Reinforcement Learning , 2008, AAAI.

[27]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[28]  Christian Gerber,et al.  Holonic multi-agent systems , 1999 .

[29]  Samir Okasha Multilevel Selection and the Major Transitions in Evolution , 2005, Philosophy of Science.

[30]  C. Lumsden Culture and the Evolutionary Process, Robert Boyd, Peter J. Richerson. University of Chicago Press, Chicago & London (1985), viii, +301. Price $29.95 , 1986 .

[31]  Madeleine Beekman,et al.  When workers disunite: intraspecific parasitism by eusocial bees. , 2008, Annual review of entomology.

[32]  J. M. Smith,et al.  The Logic of Animal Conflict , 1973, Nature.

[33]  Joel Z. Leibo,et al.  Malthusian Reinforcement Learning , 2018, AAMAS.

[34]  G. Brady Governing the Commons: The Evolution of Institutions for Collective Action , 1993 .

[35]  Ann Nowé,et al.  Evolutionary game theory and multi-agent reinforcement learning , 2005, Knowl. Eng. Rev..

[36]  Harold Houba Game Theory Evolving: a Problem-centered Introduction to Modeling Stratgeic Behavior [Review of: H. Gintis (2000) Game Theory Evolving: a Problem-centered Introduction to Modeling Strategic Behavior] , 2001 .

[37]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[38]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[39]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[40]  Wojciech Czarnecki,et al.  Multi-task Deep Reinforcement Learning with PopArt , 2019, AAAI.

[41]  Russell D Gray,et al.  Hindcasting global population densities reveals forces enabling the origin of agriculture , 2018, Nature Human Behaviour.

[42]  R. Byrne Machiavellian intelligence , 2022 .

[43]  Joel Z. Leibo,et al.  Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[44]  E. Ostrom Collective action and the evolution of social norms , 2014 .

[45]  Magnus Enquist,et al.  Modelling the evolution and diversity of cumulative culture , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[46]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[47]  Garry Kasparov Chess, a Drosophila of reasoning , 2018, Science.

[48]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[49]  M. Nowak,et al.  Evolution of indirect reciprocity by image scoring , 1998, Nature.

[50]  Anthony T Papenfuss,et al.  How the evolution of multicellularity set the stage for cancer , 2018, British Journal of Cancer.

[51]  D. Heckathorn Collective Action and the Second-Order Free-Rider Problem , 1989 .

[52]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[53]  Allen Lee,et al.  The effect of constrained communication and limited information in governing a common resource , 2014 .

[54]  J. Kasting,et al.  Life and the Evolution of Earth's Atmosphere , 2002, Science.

[55]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[56]  A. Rapoport Prisoner’s Dilemma — Recollections and Observations , 1974 .

[57]  R. Boyd,et al.  Coordinated Punishment of Defectors Sustains Cooperation and Can Proliferate When Rare , 2010, Science.

[58]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[59]  M. Gilpin Limit Cycles in Competition Communities , 1975, The American Naturalist.

[60]  Christopher Boehm Retaliatory Violence in Human Prehistory , 2011 .

[61]  Yee Whye Teh,et al.  Mix&Match - Agent Curricula for Reinforcement Learning , 2018, ICML.

[62]  S. Pinker The better angels of our nature : the decline of violence in history and its causes , 2011 .

[63]  L. Putterman,et al.  Communication and punishment in voluntary contribution experiments , 2006 .

[64]  J. Henrich Cultural group selection, coevolutionary processes and large-scale cooperation , 2004 .

[65]  M. Nowak,et al.  Problems of somatic mutation and cancer. , 2004, BioEssays : news and reviews in molecular, cellular and developmental biology.

[66]  R. Boyd,et al.  Indirect reciprocity can stabilize cooperation without the second-order free rider problem , 2004, Nature.

[67]  C. Mantzavinos Institutions , 2010 .

[68]  An evolutionary approach to norms , 1986 .

[69]  R. Wrangham Two types of aggression in human evolution , 2017, Proceedings of the National Academy of Sciences.

[70]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[71]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[72]  J. Krebs,et al.  Arms races between and within species , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[73]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[74]  J. Henrich,et al.  Innovation in the collective brain , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[75]  Yishay Mansour,et al.  Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.

[76]  K. Laland,et al.  Animal innovation: An introduction. , 2003 .

[77]  J. Goodman A Farewell to Alms: A Brief Economic History of the World , 2007 .

[78]  Kenneth O. Stanley,et al.  Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[79]  T. Schelling Hockey Helmets, Concealed Weapons, and Daylight Saving , 1973 .

[80]  G. Turner The Ecology of Adaptive Radiation , 2001, Heredity.

[81]  Joel Z. Leibo,et al.  A Generalised Method for Empirical Game Theoretic Analysis , 2018, AAMAS.

[82]  Eörs Szathmáry,et al.  The Major Transitions in Evolution , 1995 .

[83]  R. Boyd,et al.  The evolution of altruistic punishment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[84]  Robin I. M. Dunbar,et al.  Why are there so many explanations for primate brain evolution? , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[85]  G. Hardin The tragedy of the commons. , 1968, Science.

[86]  J. Henrich,et al.  The Cultural Evolution of Technology Facts and Theories , 2012 .

[87]  Sarah Mathew How the Second-Order Free Rider Problem Is Solved in a Small-Scale Society , 2017 .

[88]  N. Humphrey The Social Function of Intellect , 1976 .

[89]  J. Henrich Demography and Cultural Evolution: How Adaptive Cultural Processes Can Produce Maladaptive Losses—The Tasmanian Case , 2004, American Antiquity.

[90]  Jeffrey Paul Carpenter Punishing Free-Riders: How Group Size Affects Mutual Monitoring and the Provision of Public Goods , 2007, Games Econ. Behav..