Deep reinforcement learning models the emergent dynamics of human cooperation

Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate. Laboratory experiments have extensively explored the first part of this process, demonstrating that a variety of social-cognitive mechanisms influence how much individuals choose to invest in group efforts. However, experimental research has been unable to shed light on how social cognitive mechanisms contribute to the where and when of collective action. We leverage multi-agent deep reinforcement learning to model how a social-cognitive mechanism—specifically, the intrinsic motivation to achieve a good reputation—steers group behavior toward specific spatial and temporal strategies for collective action in a social dilemma. We also collect behavioral data from groups of human participants challenged with the same dilemma. The model accurately predicts spatial and temporal patterns of group behavior: in this public goods dilemma, the intrinsic motivation for reputation catalyzes the development of a non-territorial, turn-taking strategy to coordinate collective action.

[1]  A. Rapoport Prisoner’s Dilemma — Recollections and Observations , 1974 .

[2]  P. Trawick Successfully Governing the Commons: Principles of Social Organization in an Andean Irrigation System , 2001 .

[3]  J. Rilling,et al.  The neuroscience of social decision-making. , 2011, Annual review of psychology.

[4]  Zeb Kurth-Nelson,et al.  Deep Reinforcement Learning and Its Neuroscientific Implications , 2020, Neuron.

[5]  J. Zelmer Linear Public Goods Experiments: A Meta-Analysis , 2003 .

[6]  E. Ostrom Understanding Institutional Diversity , 2005 .

[7]  D. Ariely,et al.  Doing Good or Doing Well? Image Motivation and Monetary Incentives in Behaving Prosocially , 2007, SSRN Electronic Journal.

[8]  Bettina Rockenbach,et al.  The efficient interaction of indirect reciprocity and costly punishment , 2006, Nature.

[9]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[10]  J. W. Aldridge,et al.  Dissecting components of reward: 'liking', 'wanting', and learning. , 2009, Current opinion in pharmacology.

[11]  J. Henrich,et al.  Why Humans Cooperate: A Cultural and Evolutionary Explanation , 2007 .

[12]  Rebecca Saxe,et al.  For Love or Money: A Common Neural Currency for Social and Monetary Reward , 2008, Neuron.

[13]  Tom Eccles,et al.  Should I tear down this wall? Optimizing social metrics by evaluating novel actions , 2020, COIN@AAMAS.

[14]  H. Kruuk,et al.  Scent marking by otters (Lutra lutra): signaling the use of resources , 1992 .

[15]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[16]  J. Henrich,et al.  Costly Punishment Across Human Societies , 2006, Science.

[17]  L. Cronbach The two disciplines of scientific psychology. , 1957 .

[18]  E. Fehr,et al.  Cooperation and Punishment in Public Goods Experiments , 1999, SSRN Electronic Journal.

[19]  O. Rasa,et al.  Coordinated Vigilance in Dwarf Mongoose Family Groups: The ‘Watchman's Song’ Hypothesis and the Costs of Guarding , 2010 .

[20]  P. V. Lange,et al.  The psychology of social dilemmas: A review. , 2013 .

[21]  Colin Camerer,et al.  Neuroeconomics: decision making and the brain , 2008 .

[22]  Fikret Berkes,et al.  Local-level management and the commons problem: A comparative study of Turkish coastal fisheries☆☆☆ , 1986 .

[23]  M. Nowak,et al.  Evolution of indirect reciprocity by image scoring , 1998, Nature.

[24]  G. Brady Governing the Commons: The Evolution of Institutions for Collective Action , 1993 .

[25]  C. Parks,et al.  Social Value Orientation and Cooperation in Social Dilemmas: A Meta-Analysis , 2009 .

[26]  Michael P. Wellman,et al.  Economic reasoning and artificial intelligence , 2015, Science.

[27]  James M. Acheson,et al.  Chaos, complexity and community management of fisheries☆ , 1994 .

[28]  T. Schelling Hockey Helmets, Concealed Weapons, and Daylight Saving , 1973 .

[29]  S. Gächter,et al.  The Long-Run Benefits of Punishment , 2008, Science.

[30]  A. Sanfey Social Decision-Making: Insights from Game Theory and Neuroscience , 2007, Science.

[31]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[32]  Joel Z. Leibo,et al.  Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning , 2020, AAMAS.

[33]  C. Packer,et al.  Complex cooperative strategies in group-territorial African lions , 1995, Science.

[34]  Noah J. Goldstein,et al.  Social influence: compliance and conformity. , 2004, Annual review of psychology.

[35]  Pat Barclay,et al.  Partner choice creates competitive altruism in humans , 2007, Proceedings of the Royal Society B: Biological Sciences.

[36]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[37]  Joel Z. Leibo,et al.  Quantifying environment and population diversity in multi-agent reinforcement learning , 2021, ArXiv.

[38]  M. van vugt,et al.  Nice Guys Finish First: The Competitive Altruism Hypothesis , 2006, Personality & social psychology bulletin.

[39]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[40]  M. Janssen Introducing Ecological Dynamics into Common-Pool Resource Experiments , 2010 .

[41]  James M. Acheson,et al.  Anthropology of Fishing , 1981 .

[42]  U. Fischbacher,et al.  Are People Conditionally Cooperative? Evidence from a Public Goods Experiment , 2001 .

[43]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[44]  Stephen Coate,et al.  Centralized Versus Decentralized Provision of Local Public Goods: a Political Economy Analysis , 1999 .

[45]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[46]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[47]  Martin A. Nowak,et al.  Powering up with indirect reciprocity in a large-scale field experiment , 2013, Proceedings of the National Academy of Sciences.

[48]  M. Milinski,et al.  Cooperation through image scoring in humans. , 2000, Science.

[49]  E. Ostrom,et al.  Coping with Asymmetries in the Commons: Self-Governing Irrigation Systems Can Work , 1993 .

[50]  P. Mollinga,et al.  On the waterfront: Water distribution, technology and agrarian change in a South Indian canal irrigation system , 1998 .

[51]  Kevin J. Gaston,et al.  Measuring beta diversity for presence–absence data , 2003 .

[52]  J. Henrich The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter , 2015 .

[53]  B. Miller COLLECTIVE ACTION AND RATIONAL CHOICE: PLACE, COMMUNITY, AND THE LIMITS TO INDIVIDUAL SELF-INTEREST. , 1992 .

[54]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[55]  Tom Eccles,et al.  Learning Reciprocity in Complex Sequential Social Dilemmas , 2019, ArXiv.

[56]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[57]  M. Rabin,et al.  UNDERSTANDING SOCIAL PREFERENCES WITH SIMPLE TESTS , 2001 .

[58]  Colin Camerer Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[59]  Marco A. Janssen,et al.  Journal of Economic Behavior & Organization Evolution of Cooperation in Asymmetric Commons Dilemmas , 2022 .

[60]  Joel Z. Leibo,et al.  Evolving intrinsic motivations for altruistic behavior , 2018, AAMAS.

[61]  N. Sadato,et al.  Processing of Social and Monetary Rewards in the Human Striatum , 2008, Neuron.

[62]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[63]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[64]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[65]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[66]  P. Kollock SOCIAL DILEMMAS: The Anatomy of Cooperation , 1998 .

[67]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[68]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[69]  Joel Z. Leibo,et al.  DeepMind Lab2D , 2020, ArXiv.

[70]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[71]  M. Milinski,et al.  Reputation helps solve the ‘tragedy of the commons’ , 2002, Nature.

[72]  Joel Z. Leibo,et al.  Open Problems in Cooperative AI , 2020, ArXiv.

[73]  Michael Schoon,et al.  TURFs in the Lab : Institutional Innovation in Real-Time Dynamic Spatial Commons , 2008 .

[74]  C. Sripada,et al.  Reputation for reciprocity engages the brain reward center , 2010, Proceedings of the National Academy of Sciences.

[75]  D. Borsboom,et al.  Simpson's paradox in psychological science: a practical guide , 2013, Front. Psychol..

[76]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[77]  G. Jenks The Data Model Concept in Statistical Mapping , 1967 .

[78]  Allen Lee,et al.  Experimental platforms for behavioral experiments on social-ecological systems , 2014 .

[79]  Michael A. Cohen,et al.  What is the Bandwidth of Perceptual Experience? , 2016, Trends in Cognitive Sciences.

[80]  Mark Van Vugt,et al.  Cooperation for reputation: Wasteful contributions as costly signals in public goods , 2010 .

[81]  P. V. Lange,et al.  Reputation, Gossip, and Human Cooperation , 2016 .

[82]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[83]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[84]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[85]  R. Whittaker Vegetation of the Siskiyou Mountains, Oregon and California , 1960 .

[86]  E. Ostrom,et al.  Robustness of Social-Ecological Systems to Spatial and Temporal Variability , 2007 .

[87]  M. Tomasello,et al.  Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligence Hypothesis , 2007, Science.

[88]  Glen E. Woolfenden,et al.  A sentinel system in the Florida scrub jay , 1989, Animal Behaviour.

[89]  Dirk Engelmann,et al.  Indirect Reciprocity and Strategic Reputation Building in an Experimental Helping Game , 2002, Games Econ. Behav..

[90]  E. Fehr,et al.  Altruistic punishment in humans , 2002, Nature.

[91]  E. Ostrom,et al.  Rules, Games, and Common-Pool Resources , 1994 .

[92]  E. Ostrom,et al.  Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.