Understanding collective behaviors in reinforcement learning evolutionary games via a belief-based formalization.

Collective behaviors by self-organization are ubiquitous in nature and human society and extensive efforts have been made to explore the mechanisms behind them. Artificial intelligence (AI) as a rapidly developing field is of great potential for these tasks. By combining reinforcement learning with evolutionary game (RLEG), we numerically discover a rich spectrum of collective behaviors-explosive events, oscillation, and stable states, etc., that are also often observed in the human society. In this work, we aim to provide a theoretical framework to investigate the RLEGs systematically. Specifically, we formalize AI-agents' learning processes in terms of belief switches and behavior modes defined as a series of actions following beliefs. Based on the preliminary results in the time-independent environment, we investigate the stability at the mixed equilibrium points in RLEGs generally, in which agents reside in one of the optimal behavior modes. Moreover, we adopt the maximum entropy principle to infer the composition of agents residing in each mode at a strictly stable point. When the theoretical analysis is applied to the 2×2 game setting, we can explain the uncovered collective behaviors and are able to construct equivalent systems intuitively. Also, the inferred composition of different modes is consistent with simulations. Our work may be helpful to understand the related collective emergence in human society as well as behavioral patterns at the individual level and potentially facilitate human-computer interactions in the future.

[1]  Daniel Alvear,et al.  Methods for measuring collective behaviour in evacuees , 2016 .

[2]  Zengchang Qin,et al.  Collective game behavior learning with probabilistic graphical models , 2016, Neurocomputing.

[3]  Akira Namatame,et al.  Collective Behavior in Cascade and Schelling Model , 2013 .

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  John Joseph Valletta,et al.  Applications of machine learning in animal behaviour studies , 2017, Animal Behaviour.

[6]  Attila Szolnoki,et al.  Vortices determine the dynamics of biodiversity in cyclical interactions with protection spillovers , 2015, ArXiv.

[7]  Baijiong Lin,et al.  Binary neutron stars gravitational wave detection based on wavelet packet analysis and convolutional neural networks , 2019, Frontiers of Physics.

[8]  A. Czirók,et al.  Collective Motion , 1999, physics/9902023.

[9]  M. K. Ali,et al.  Convergence of reinforcement learning algorithms and acceleration of learning. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  V. Plerou,et al.  Quantifying and interpreting collective behavior in financial markets. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Ji-Qiang Zhang,et al.  Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning , 2019, Nonlinear Dynamics.

[12]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[13]  Long Wang,et al.  Understanding spatial public goods games on three-layer networks , 2018, New Journal of Physics.

[14]  Maurizio Porfiri,et al.  Topological analysis of group fragmentation in multiagent systems. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Katja Ried,et al.  Modelling collective motion based on the principle of agency: General framework and the case of marching locusts , 2017, PloS one.

[16]  Julian Hagenauer,et al.  A comparative study of machine learning classifiers for modeling travel mode choice , 2017, Expert Syst. Appl..

[17]  Zengchang Qin,et al.  Evolutionary collective behavior decomposition model for time series data mining , 2015, Appl. Soft Comput..

[18]  R. Boyd,et al.  Indirect reciprocity can stabilize cooperation without the second-order free rider problem , 2004, Nature.

[19]  Ming Cao,et al.  COORDINATION OF AN ASYNCHRONOUS MULTI-AGENT SYSTEM VIA AVERAGING , 2005 .

[20]  B. Sinervo,et al.  Density cycles and an offspring quantity and quality game driven by natural selection , 2000, Nature.

[21]  David S. Wishart,et al.  Applications of Machine Learning in Cancer Prediction and Prognosis , 2006, Cancer informatics.

[22]  Hui Zhai,et al.  Machine learning of frustrated classical spin models (II): Kernel principal component analysis , 2018, Frontiers of Physics.

[23]  R. Durrett,et al.  Spatial evolutionary games with weak selection , 2017, Proceedings of the National Academy of Sciences.

[24]  Jonathan Newton,et al.  Evolutionary Game Theory: A Renaissance , 2018, Games.

[25]  Laura Fortunato,et al.  A solution to the collective action problem in between-group conflict with within-group inequality , 2014, Nature Communications.

[26]  D. Sumpter,et al.  Inferring the rules of interaction of shoaling fish , 2011, Proceedings of the National Academy of Sciences.

[27]  Erik Aurell,et al.  Maximum likelihood reconstruction for Ising models with asynchronous updates. , 2013, Physical review letters.

[28]  Maurizio Porfiri,et al.  Topological analysis of complexity in multiagent systems. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[30]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[31]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Wenxu Wang,et al.  Memory-based snowdrift game on networks. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Michael J. Berry,et al.  The simplest maximum entropy model for collective behavior in a neural network , 2012, 1207.6319.

[34]  Rémi Monasson,et al.  Emergence of Compositional Representations in Restricted Boltzmann Machines , 2016, Physical review letters.

[35]  György Szabó,et al.  Phase transitions and volunteering in spatial public goods games. , 2002, Physical review letters.

[36]  D. Fudenberg,et al.  Evolutionary cycles of cooperation and defection. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  David G. Rand,et al.  Direct reciprocity in structured populations , 2012, Proceedings of the National Academy of Sciences.

[38]  Leah Edelstein-Keshet,et al.  Inferring individual rules from collective behavior , 2010, Proceedings of the National Academy of Sciences.

[39]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[40]  G. Parisi,et al.  Interaction ruling animal collective behavior depends on topological rather than metric distance: Evidence from a field study , 2007, Proceedings of the National Academy of Sciences.