A Study of AI Population Dynamics with Million-agent Reinforcement Learning

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning. Our intention is to put intelligent agents into a simulated natural context and verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning (DRL). In order to scale the population size up to millions agents, a large-scale DRL training platform with redesigned experience buffer is proposed. Our results show that the population dynamics of AI agents, driven only by each agent's individual self-interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.

[1]  Hiro-Sato Niwa Self-organizing Dynamic Model of Fish Schooling , 1994 .

[2]  Daniele Marinazzo,et al.  Evolutionary dynamics of group formation , 2016, PloS one.

[3]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[4]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[5]  David M. Raup,et al.  How Nature Works: The Science of Self-Organized Criticality , 1997 .

[6]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[7]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[8]  Sam Devlin,et al.  Resource Abstraction for Reinforcement Learning in Multiagent Congestion Problems , 2016, AAMAS.

[9]  W. Ross Ashby,et al.  Principles of the Self-Organizing System , 1991 .

[10]  Ada Palmer,et al.  Reading Lucretius in the Renaissance , 2012 .

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[14]  R. Jeanne,et al.  Interindividual Behavioral Variability in Social Insects , 1988 .

[15]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[16]  R I M Dunbar,et al.  Group size, vocal grooming and the origins of language , 2016, Psychonomic Bulletin & Review.

[17]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[18]  T. Seeley The Wisdom of the Hive: The Social Physiology of Honey Bee Colonies , 1995 .

[19]  Peng Peng,et al.  Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games , 2017, 1703.10069.

[20]  T. Pankiw,et al.  Response thresholds to sucrose predict foraging division of labor in honeybees , 2000, Behavioral Ecology and Sociobiology.

[21]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[22]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[23]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[24]  Yoshinobu Inada,et al.  Order and flexibility in the motion of fish schools. , 2002, Journal of theoretical biology.

[25]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[26]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[27]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[28]  Stuart A. Kauffman,et al.  The origins of order , 1993 .

[29]  J. Deneubourg,et al.  Self-organized shortcuts in the Argentine ant , 1989, Naturwissenschaften.

[30]  A. J. Lotka,et al.  Elements of Physical Biology. , 1925, Nature.

[31]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[32]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[33]  Craig Packer,et al.  Group formation stabilizes predator–prey dynamics , 2007, Nature.

[34]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[35]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[36]  Guy Theraulaz,et al.  Self-Organization in Biological Systems , 2001, Princeton studies in complexity.

[37]  A. Barabasi,et al.  Physics of the rhythmic applause. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[38]  D. Sumpter The principles of collective animal behaviour , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[39]  Anja Weidenmüller,et al.  The control of nest climate in bumblebee (Bombus terrestris) colonies: interindividual variability and self reinforcement in fanning response , 2004 .

[40]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[41]  Mauro F. Guillén,et al.  Business Groups in Emerging Economies: A Resource-Based View , 2000 .

[42]  E. Schrödinger,et al.  What is life? : the physical aspect of the living cell , 1946 .

[43]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[44]  Michael E. Gilpin,et al.  Do Hares Eat Lynx? , 1973, The American Naturalist.

[45]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[46]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[47]  Hugo Larochelle,et al.  GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Sam Devlin,et al.  Theoretical considerations of potential-based reward shaping for multi-agent systems , 2011, AAMAS.

[49]  E. Bonabeau,et al.  Self-organization in social insects. , 1997, Trends in ecology & evolution.

[50]  E. Wilson The Insect Societies , 1974 .

[51]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[52]  Theresa A. Pardo,et al.  Conceptualizing smart city with dimensions of technology, people, and institutions , 2011, dg.o '11.

[53]  L. Boltzmann The Second Law of Thermodynamics , 1974 .

[54]  Lantao Yu,et al.  An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning , 2017, ArXiv.

[55]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[56]  Randy Thornhill,et al.  Sexual Selection and Paternal Investment in Insects , 1976, The American Naturalist.

[57]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.