Inequity aversion improves cooperation in intertemporal social dilemmas

Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.

[1]  C. Gini Variabilità e mutabilità : contributo allo studio delle distribuzioni e delle relazioni statistiche , 1912 .

[2]  G. Grice The relation of secondary reinforcement to delayed reward in visual discrimination learning. , 1948, Journal of experimental psychology.

[3]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[4]  H. Hart Are There Any Natural Rights , 1955 .

[5]  J. Rawls Justice as Fairness , 2001 .

[6]  M. Olson,et al.  The Logic of Collective Action , 1965 .

[7]  G. Hardin,et al.  Tragedy of the Commons , 1968 .

[8]  T. Schelling Hockey Helmets, Concealed Weapons, and Daylight Saving , 1973 .

[9]  P. Oliver Rewards and Punishments as Selective Incentives for Collective Action: Theoretical Investigations , 1980, American Journal of Sociology.

[10]  T. L. Schwartz The Logic of Collective Action , 1986 .

[11]  T. Yamagishi The provision of a sanctioning system as a public good , 1986 .

[12]  L. Thompson,et al.  Social Utility and Decision Making in Interpersonal Contexts , 1989 .

[13]  G. Klosko The Principle of Fairness and Political Obligation , 1987, Ethics.

[14]  E. Ostrom,et al.  Covenants with and without a Sword: Self-Governance Is Possible , 1992, American Political Science Review.

[15]  J. Rousseau,et al.  Discourse on the Origin of Inequality , 1992 .

[16]  Robert Gibbons,et al.  A primer in game theory , 1992 .

[17]  G. Brady Governing the Commons: The Evolution of Institutions for Collective Action , 1993 .

[18]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[19]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[20]  E. Ostrom A Behavioral Approach to the Rational Choice Theory of Collective Action: Presidential Address, American Political Science Association, 1997 , 1998, American Political Science Review.

[21]  P. Kollock SOCIAL DILEMMAS: The Anatomy of Cooperation , 1998 .

[22]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[23]  E. Fehr,et al.  Cooperation and Punishment in Public Goods Experiments , 1999, SSRN Electronic Journal.

[24]  M. Rabin,et al.  Understanding Social Preference with Simple Tests , 2001 .

[25]  Michael Kearns,et al.  Bias-Variance Error Bounds for Temporal Difference Updates , 2000, COLT.

[26]  G. Tesauro,et al.  Analyzing Complex Strategic Interactions in Multi-Agent Systems , 2002 .

[27]  E. Fehr,et al.  Altruistic punishment in humans , 2002, Nature.

[28]  Ann Nowé,et al.  Homo Egualis Reinforcement Learning Agents for Load Balancing , 2002, WRAC.

[29]  Colin Camerer,et al.  Measuring Social Norms and Preferences Using Experimental Games: A Guide for Social Scientists , 2002 .

[30]  Martin Strobel,et al.  Inequality Aversion, Efficiency, and Maximin Preferences in Simple Distribution Experiments , 2002 .

[31]  E. Ostrom,et al.  The Struggle to Govern the Commons , 2003, Science.

[32]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[33]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Armin Falk,et al.  A Theory of Reciprocity , 2001, Games Econ. Behav..

[36]  Alessandro Lazaric,et al.  Learning to cooperate in multi-agent social dilemmas , 2006, AAMAS '06.

[37]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[38]  B. Rockenbach,et al.  The Competitive Advantage of Sanctioning Institutions , 2006, Science.

[39]  H. Gintis,et al.  Human Motivation and Social Cooperation: Experimental and Analytical , 2007 .

[40]  Charles Bellemare,et al.  MEASURING INEQUITY AVERSION IN A HETEROGENEOUS POPULATION USING EXPERIMENTAL DECISIONS AND SUBJECTIVE PROBABILITIES , 2008 .

[41]  C. Bicchieri,et al.  Behaving as Expected: Public Information and Fairness Norms , 2008 .

[42]  Catherine C. Eckel,et al.  Blaming the Messenger: Notes on the Current State of Experimental Economics , 2010 .

[43]  J. Henrich,et al.  Markets, Religion, Community Size, and the Evolution of Fairness and Punishment , 2010, Science.

[44]  Karl Tuyls,et al.  Human-inspired computational fairness , 2010, Autonomous Agents and Multi-Agent Systems.

[45]  M. Janssen Introducing Ecological Dynamics into Common-Pool Resource Experiments , 2010 .

[46]  E. Ostrom,et al.  Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[47]  M. Janssen The Role of Information in Governing the Commons: Experimental Results , 2013 .

[48]  David G. Rand,et al.  Human cooperation , 2013, Trends in Cognitive Sciences.

[49]  ปิยดา สมบัติวัฒนา Behavioral Game Theory: Experiments in Strategic Interaction , 2013 .

[50]  Eva I. Hoppe,et al.  Contracting under Incomplete Information and Social Preferences: An Experimental Study , 2013 .

[51]  F. D. de Waal,et al.  Evolution of responses to (un)fairness , 2014, Science.

[52]  F. Warneken,et al.  The ontogeny of fairness in seven societies , 2015, Nature.

[53]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[54]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[55]  B. Frey,et al.  Institutions Affect Fairness: Experimental Investigations , 2016 .

[56]  B. Skinner,et al.  The Behavior of Organisms: An Experimental Analysis , 2016 .

[57]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[58]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[59]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[60]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[61]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[62]  Alexander Peysakhovich,et al.  Consequentialist conditional cooperation in social dilemmas with imperfect information , 2017, AAAI Workshops.

[63]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.