论文信息 - An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning

An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning

In this paper, we conduct an empirical study on discovering the ordered collective dynamics obtained by a population of artificial intelligence (AI) agents. Our intention is to put AI agents into a simulated natural context, and then to understand their induced dynamics at the population level. In particular, we aim to verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning, and scale the population size up to millions. Our results show that the population dynamics of AI agents, driven only by each agent's individual self interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.

[1] David M. Raup,et al. How Nature Works: The Science of Self-Organized Criticality , 1997 .

[2] Alexander Peysakhovich,et al. Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[3] T. Seeley. The Wisdom of the Hive: The Social Physiology of Honey Bee Colonies , 1995 .

[4] Hiro-Sato Niwa. Self-organizing Dynamic Model of Fish Schooling , 1994 .

[5] V. Latora,et al. Complex networks: Structure and dynamics , 2006 .

[6] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[7] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[8] Michael E. Gilpin,et al. Do Hares Eat Lynx? , 1973, The American Naturalist.

[9] Craig Packer,et al. Group formation stabilizes predator–prey dynamics , 2007, Nature.

[10] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[11] D. Sumpter. The principles of collective animal behaviour , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12] T. Pankiw,et al. Response thresholds to sucrose predict foraging division of labor in honeybees , 2000, Behavioral Ecology and Sociobiology.

[13] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[14] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[15] Anja Weidenmüller,et al. The control of nest climate in bumblebee (Bombus terrestris) colonies: interindividual variability and self reinforcement in fanning response , 2004 .

[16] E. Wilson. The Insect Societies , 1974 .

[17] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[18] Shimon Whiteson,et al. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[19] Daniele Marinazzo,et al. Evolutionary dynamics of group formation , 2016, PloS one.

[20] Jun Wang,et al. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[21] W. Ross Ashby,et al. Principles of the Self-Organizing System , 1991 .

[22] L. Boltzmann. The Second Law of Thermodynamics , 1974 .

[23] R I M Dunbar,et al. Group size, vocal grooming and the origins of language , 2017, Psychonomic bulletin & review.

[24] Michael P. Wellman,et al. Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[25] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[26] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] E. Bonabeau,et al. Self-organization in social insects. , 1997, Trends in ecology & evolution.

[28] Tie-Yan Liu,et al. Dual Learning for Machine Translation , 2016, NIPS.

[29] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[30] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[31] Yoshinobu Inada,et al. Order and flexibility in the motion of fish schools. , 2002, Journal of theoretical biology.

[32] Nicolas Usunier,et al. Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[33] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[34] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[35] Mauro F. Guillén,et al. Business Groups in Emerging Economies: A Resource-Based View , 2000 .