Multi-agent Reinforcement Learning in Sequential Social Dilemmas

Matrix games like Prisoner's Dilemma have guided research on social dilemmas for decades. However, they necessarily treat the choice to cooperate or defect as an atomic action. In real-world social dilemmas these choices are temporally extended. Cooperativeness is a property that applies to policies, not elementary actions. We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions. We analyze the dynamics of policies learned by multiple self-interested independent learning agents, each using its own deep Q-network, on two Markov games we introduce here: 1. a fruit Gathering game and 2. a Wolfpack hunting game. We characterize how learned behavior in each domain changes as a function of environmental factors including resource abundance. Our experiments show how conflict can emerge from competition over shared resources and shed light on how the sequential nature of real world social dilemmas affects cooperation.

[1]  R. Trivers The Evolution of Reciprocal Altruism , 1971, The Quarterly Review of Biology.

[2]  A. Rapoport Prisoner’s Dilemma — Recollections and Observations , 1974 .

[3]  T. Schelling Micromotives and Macrobehavior , 1978 .

[4]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[5]  R. Axelrod An Evolutionary Approach to Norms , 1986, American Political Science Review.

[6]  M. Nowak,et al.  Evolutionary games and spatial chaos , 1992, Nature.

[7]  M. Nowak,et al.  Tit for tat in heterogeneous populations , 1992, Nature.

[8]  M. Nowak,et al.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game , 1993, Nature.

[9]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[10]  C. Parks,et al.  High And Low Trusters' Responses To Fear in a Payoff Matrix , 1995 .

[11]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[12]  Peter Dayan,et al.  A Neural Substrate of Prediction and Reward , 1997, Science.

[13]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  M. Nowak,et al.  Evolution of indirect reciprocity by image scoring , 1998, Nature.

[16]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[17]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[18]  Michail G. Lagoudakis,et al.  Value Function Approximation in Zero-Sum Markov Games , 2002, UAI.

[19]  M. Macy,et al.  Learning dynamics in social dilemmas , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[21]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[22]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[23]  P. Dayan,et al.  Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[24]  Michael L. Littman,et al.  Cyclic Equilibria in Markov Games , 2005, NIPS.

[25]  F. C. Santos,et al.  A new route to the evolution of cooperation , 2006, Journal of evolutionary biology.

[26]  Katherine V. Kortenkamp,et al.  Time, Uncertainty, and Individual Differences in Decisions to Cooperate in Resource Dilemmas , 2006, Personality & social psychology bulletin.

[27]  Alessandro Lazaric,et al.  Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[28]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[29]  Michael P. Wellman,et al.  Methods for empirical game-theoretic analysis (extended abstract) , 2006 .

[30]  H. Ohtsuki,et al.  A simple rule for the evolution of cooperation on graphs and social networks , 2006, Nature.

[31]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[32]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[33]  Y. Niv Reinforcement learning in the brain , 2009 .

[34]  Michael L. Littman,et al.  Classes of Multiagent Q-learning Dynamics with epsilon-greedy Exploration , 2010, ICML.

[35]  J. Forgas,et al.  When happiness makes us selfish, but sadness makes us fair: Affective influences on interpersonal strategies in the dictator game , 2010 .

[36]  N. Le Fort-Piat,et al.  The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[37]  Martin A. Riedmiller,et al.  Batch Reinforcement Learning , 2012, Reinforcement Learning.

[38]  P. V. Lange,et al.  The psychology of social dilemmas: A review. , 2013 .

[39]  Kevin Leyton-Brown,et al.  Empirically Evaluating Multiagent Learning Algorithms , 2014, ArXiv.

[40]  Minjie Zhang,et al.  Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[42]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[43]  Karl Tuyls,et al.  Evolutionary Dynamics of Multi-Agent Learning: A Survey , 2015, J. Artif. Intell. Res..

[44]  Bruno Scherrer,et al.  Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games , 2015, ICML.

[45]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[46]  Michael Luck,et al.  Cooperation Emergence under Resource-Constrained Peer Punishment , 2016, AAMAS.

[47]  Bruno Scherrer,et al.  On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games , 2016, AISTATS.

[48]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[49]  Branislav Bosanský,et al.  Algorithms for computing strategies in two-player simultaneous move games , 2016, Artif. Intell..

[50]  Matthieu Geist,et al.  Softened Approximate Policy Iteration for Markov Games , 2016, ICML.