Malthusian Reinforcement Learning

Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation's average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship between preindustrial income levels and population growth. Malthusian reinforcement learning harnesses the competitive pressures arising from growing and shrinking population size to drive agents to explore regions of state and policy spaces that they could not otherwise reach. Furthermore, in environments where there are potential gains from specialization and division of labor, we show that Malthusian reinforcement learning is better positioned to take advantage of such synergies than algorithms based on self-play.

[1]  T. Malthus An Essay on the Principle of Population or a View of Its Past and Present Effects on Human Happiness , 1971 .

[2]  M. Wade Soft Selection, Hard Selection, Kin Selection, and Group Selection , 1985, The American Naturalist.

[3]  Michael L. Johnson,et al.  Evolution of Dispersal: Theoretical Models and Empirical Tests Using Birds and Mammals , 1990 .

[4]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  A. Griffin,et al.  Cooperation and Competition Between Relatives , 2002, Science.

[7]  J. Bruno,et al.  Inclusion of facilitation into ecological theory , 2003 .

[8]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[9]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  J. Henrich Cultural group selection, coevolutionary processes and large-scale cooperation , 2004 .

[11]  J. Henrich Demography and Cultural Evolution: How Adaptive Cultural Processes Can Produce Maladaptive Losses—The Tasmanian Case , 2004, American Antiquity.

[12]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[13]  P. Mellars Why did modern human populations disperse from Africa ca. 60,000 years ago? A new model. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[15]  J. Goodman A Farewell to Alms: A Brief Economic History of the World , 2007 .

[16]  T. Goebel,et al.  The Late Pleistocene Dispersal of Modern Humans in the Americas , 2008, Science.

[17]  Adam Powell,et al.  Late Pleistocene Demography and the Appearance of Modern Human Behavior , 2009, Science.

[18]  P. Richerson,et al.  Cultural Innovations and Demographic Change , 2009, Human biology.

[19]  J. Henrich,et al.  Markets, Religion, Community Size, and the Evolution of Fairness and Punishment , 2010, Science.

[20]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[21]  J. Henrich,et al.  The cultural niche: Why social learning is essential for human adaptation , 2011, Proceedings of the National Academy of Sciences.

[22]  J. Stewart,et al.  Human Evolution Out of Africa: The Role of Refugia and Climate Change , 2012, Science.

[23]  S. Lycett,et al.  Late Pleistocene climate change and the global expansion of anatomically modern humans , 2012, Proceedings of the National Academy of Sciences.

[24]  Marius Kempe,et al.  From cultural traditions to cumulative culture: parameterizing the differences between human and nonhuman culture. , 2014, Journal of theoretical biology.

[25]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[26]  J. Henrich,et al.  Innovation in the collective brain , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[27]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[28]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[29]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[30]  Marcus Hutter,et al.  Count-Based Exploration in Feature Space for Reinforcement Learning , 2017, IJCAI.

[31]  Robin I. M. Dunbar,et al.  Why are there so many explanations for primate brain evolution? , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[32]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[34]  A. P. Hyper-parameters Count-Based Exploration with Neural Density Models , 2017 .

[35]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[36]  Nando de Freitas,et al.  Intrinsic Social Motivation via Causal Influence in Multi-Agent RL , 2018, ArXiv.

[37]  Lantao Yu,et al.  A Study of AI Population Dynamics with Million-agent Reinforcement Learning , 2017, AAMAS.

[38]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[39]  Joel Z. Leibo,et al.  Human-level performance in first-person multiplayer games with population-based deep reinforcement learning , 2018, ArXiv.

[40]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[41]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[42]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.