Inequity aversion resolves intertemporal social dilemmas

Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.

[1]  Colin Camerer,et al.  Measuring Social Norms and Preferences Using Experimental Games: A Guide for Social Scientists , 2002 .

[2]  Charles Bellemare,et al.  MEASURING INEQUITY AVERSION IN A HETEROGENEOUS POPULATION USING EXPERIMENTAL DECISIONS AND SUBJECTIVE PROBABILITIES , 2008 .

[3]  Robert Gibbons,et al.  A primer in game theory , 1992 .

[4]  Eva I. Hoppe,et al.  Contracting under Incomplete Information and Social Preferences: An Experimental Study , 2013 .

[5]  E. Fehr,et al.  Altruistic punishment in humans , 2002, Nature.

[6]  C. Bicchieri,et al.  Behaving as Expected: Public Information and Fairness Norms , 2008 .

[7]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[8]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[9]  Karl Tuyls,et al.  Human-inspired computational fairness , 2010, Autonomous Agents and Multi-Agent Systems.

[10]  T. L. Schwartz The Logic of Collective Action , 1986 .

[11]  Armin Falk,et al.  A Theory of Reciprocity , 2001, Games Econ. Behav..

[12]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[13]  C. Gini Variabilità e mutabilità : contributo allo studio delle distribuzioni e delle relazioni statistiche , 1912 .

[14]  M. Janssen The Role of Information in Governing the Commons: Experimental Results , 2013 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  F. Warneken,et al.  The ontogeny of fairness in seven societies , 2015, Nature.

[17]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[18]  B. Rockenbach,et al.  The Competitive Advantage of Sanctioning Institutions , 2006, Science.

[19]  T. Yamagishi The provision of a sanctioning system as a public good , 1986 .

[20]  E. Fehr,et al.  Cooperation and Punishment in Public Goods Experiments , 1999, SSRN Electronic Journal.

[21]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[22]  Martin Strobel,et al.  Inequality Aversion, Efficiency, and Maximin Preferences in Simple Distribution Experiments , 2002 .

[23]  J. Rawls Justice as Fairness , 2001 .

[24]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[25]  G. Brady Governing the Commons: The Evolution of Institutions for Collective Action , 1993 .

[26]  E. Ostrom,et al.  Covenants with and without a Sword: Self-Governance Is Possible , 1992, American Political Science Review.

[27]  H. Hart Are There Any Natural Rights , 1955 .

[28]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[29]  L. Thompson,et al.  Social Utility and Decision Making in Interpersonal Contexts , 1989 .

[30]  ปิยดา สมบัติวัฒนา Behavioral Game Theory: Experiments in Strategic Interaction , 2013 .

[31]  T. Schelling Hockey Helmets, Concealed Weapons, and Daylight Saving , 1973 .

[32]  P. Oliver Rewards and Punishments as Selective Incentives for Collective Action: Theoretical Investigations , 1980, American Journal of Sociology.

[33]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[34]  M. Rabin,et al.  Understanding Social Preference with Simple Tests , 2001 .

[35]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[36]  Catherine C. Eckel,et al.  Blaming the Messenger: Notes on the Current State of Experimental Economics , 2010 .

[37]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[38]  E. Ostrom A Behavioral Approach to the Rational Choice Theory of Collective Action: Presidential Address, American Political Science Association, 1997 , 1998, American Political Science Review.

[39]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[40]  G. Klosko The Principle of Fairness and Political Obligation , 1987, Ethics.

[41]  E. Ostrom,et al.  Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[42]  H. Gintis,et al.  Human Motivation and Social Cooperation: Experimental and Analytical , 2007 .

[43]  Ann Nowé,et al.  Homo Egualis Reinforcement Learning Agents for Load Balancing , 2002, WRAC.

[44]  E. Ostrom,et al.  The Struggle to Govern the Commons , 2003, Science.

[45]  M. Janssen Introducing Ecological Dynamics into Common-Pool Resource Experiments , 2010 .

[46]  J. Henrich,et al.  Markets, Religion, Community Size, and the Evolution of Fairness and Punishment , 2010, Science.

[47]  G. Grice The relation of secondary reinforcement to delayed reward in visual discrimination learning. , 1948, Journal of experimental psychology.

[48]  G. Hardin,et al.  The Tragedy of the Commons , 1968, Green Planet Blues.

[49]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[50]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[51]  F. D. de Waal,et al.  Evolution of responses to (un)fairness , 2014, Science.

[52]  B. Frey,et al.  Institutions Affect Fairness: Experimental Investigations , 2016 .

[53]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[54]  P. Kollock SOCIAL DILEMMAS: The Anatomy of Cooperation , 1998 .

[55]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[56]  Alessandro Lazaric,et al.  Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.