A multi-agent reinforcement learning model of common-pool resource appropriation

Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find socially positive equilibria---a phenomenon called the tragedy of the commons. However, in reality, human societies are sometimes able to discover and implement stable cooperative solutions. Decades of behavioral game theory research have sought to uncover aspects of human behavior that make this possible. Most of that work was based on laboratory experiments where participants only make a single choice: how much to appropriate. Recognizing the importance of spatial and temporal resource dynamics, a recent trend has been toward experiments in more complex real-time video game-like environments. However, standard methods of non-cooperative game theory can no longer be used to generate predictions for this case. Here we show that deep reinforcement learning can be used instead. To that end, we study the emergent behavior of groups of independently learning agents in a partially observed Markov game modeling common-pool resource appropriation. Our experiments highlight the importance of trial-and-error learning in common-pool resource appropriation and shed light on the relationship between exclusion, sustainability, and inequality.

[1]  Kent O. Martin,et al.  Play by the Rules or Don’t Play at All: Space Division and Resource Allocation in a Rural Newfoundland Fishing Community , 1979 .

[2]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3]  E. Ostrom,et al.  Revisiting the commons: local lessons, global challenges. , 1999, Science.

[4]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[5]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[8]  C. Gini Variabilità e mutabilità : contributo allo studio delle distribuzioni e delle relazioni statistiche , 1912 .

[9]  E. Ostrom,et al.  The Struggle to Govern the Commons , 2003, Science.

[10]  M. Janssen Introducing Ecological Dynamics into Common-Pool Resource Experiments , 2010 .

[11]  Minjie Zhang,et al.  Emotional Multiagent Reinforcement Learning in Spatial Social Dilemmas , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[13]  G. Hardin,et al.  Tragedy of the Commons , 1968 .

[14]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[15]  Michael Schoon,et al.  TURFs in the Lab : Institutional Innovation in Real-Time Dynamic Spatial Commons , 2008 .

[16]  Tim Gray,et al.  Territoriality as a Driver of Fishers' Spatial Behavior in the Northumberland Lobster Fishery , 2013 .

[17]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[18]  Craig Boutilier,et al.  Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.

[19]  T. Schelling Hockey Helmets, Concealed Weapons, and Daylight Saving , 1973 .

[20]  Robert L. Goldstone,et al.  Effect of rule choice in dynamic interactive spatial commons , 2008 .

[21]  Colin Camerer Progress in Behavioral Game Theory , 1997 .

[22]  Hanna Kokko,et al.  The tragedy of the commons in evolutionary biology. , 2007, Trends in ecology & evolution.

[23]  M. Nowak,et al.  A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game , 1993, Nature.

[24]  E. Ostrom,et al.  Public Goods and Public Choices , 2019, Alternatives for Delivering Public Services.

[25]  Vernon L. Smith,et al.  Economics of Production from Natural Resources , 1974 .

[26]  E. Ostrom,et al.  Lab Experiments for the Study of Social-Ecological Systems , 2010, Science.

[27]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[28]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[29]  M. Janssen The Role of Information in Governing the Commons: Experimental Results , 2013 .

[30]  Elinor Ostrom,et al.  The Nature of Common-Pool Resource Problems , 1990 .

[31]  N. Le Fort-Piat,et al.  The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[32]  Milind Tambe,et al.  Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping , 2009, ICAPS.

[33]  E. Ostrom,et al.  Coping with Asymmetries in the Commons: Self-Governing Irrigation Systems Can Work , 1993 .

[34]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[35]  R. Gardner,et al.  Spatial Strategies and Territoriality in the Maine Lobster Industry , 2005 .

[36]  G. Brady Governing the Commons: The Evolution of Institutions for Collective Action , 1993 .

[37]  E. Ostrom,et al.  Rules, Games, and Common-Pool Resources , 1994 .

[38]  H Scottgordon The economic theory of a common-property resource: The fishery , 1991 .