Modelling Cooperation in Network Games with Spatio-Temporal Complexity

The real world is awash with multi-agent problems that require collective action by self-interested agents, from the routing of packets across a computer network to the management of irrigation systems. Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group. Given appropriate mechanisms describing agent interaction, groups may achieve socially beneficial outcomes, even in the face of short-term selfish incentives. In many cases, collective action problems possess an underlying graph structure, whose topology crucially determines the relationship between local decisions and emergent global effects. Such scenarios have received great attention through the lens of network games. However, this abstraction typically collapses important dimensions, such as geometry and time, relevant to the design of mechanisms promoting cooperation. In parallel work, multi-agent deep reinforcement learning has shown great promise in modelling the emergence of self-organized cooperation in complex gridworld domains. Here we apply this paradigm in graph-structured collective action problems. Using multi-agent deep reinforcement learning, we simulate an agent society for a variety of plausible mechanisms, finding clear transitions between different equilibria over time. We define analytic tools inspired by related literatures to measure the social outcomes, and use these to draw conclusions about the efficacy of different environmental interventions. Our methods have implications for mechanism design in both human and artificial agent systems.

[1]  Hongyuan Zha,et al.  Learning to Incentivize Other Learning Agents , 2020, NeurIPS.

[2]  Eric Maskin,et al.  Introduction to mechanism design and implementation† , 2019, Transnational Corporations Review.

[3]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[4]  W. Hamilton,et al.  The evolution of cooperation. , 1984, Science.

[5]  Julian Togelius,et al.  Procedural Level Generation Improves Generality of Deep Reinforcement Learning , 2018, ArXiv.

[6]  John Shawe-Taylor,et al.  Adaptive Mechanism Design: Learning to Promote Cooperation , 2018, 2020 International Joint Conference on Neural Networks (IJCNN).

[7]  Ralf Lämmel,et al.  WES: Agent-based User Interaction Simulation on Real Infrastructure , 2020, ICSE.

[8]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[9]  Joan Cocks,et al.  The Ethics of Care: Personal, Political, and Global , 2006, Perspectives on Politics.

[10]  Martin A Nowak,et al.  Direct reciprocity on graphs. , 2007, Journal of theoretical biology.

[11]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[12]  Paul Dütting,et al.  Optimal auctions through deep learning , 2017, ICML.

[13]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[14]  G. Owen,et al.  A Simple Expression for the Shapley Value in a Special Case , 1973 .

[15]  Abhijit Gosavi,et al.  Global supply chain management: A reinforcement learning approach , 2002 .

[16]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[17]  Jonathan P. How,et al.  Learning to Teach in Cooperative Multiagent Reinforcement Learning , 2018, AAAI.

[18]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[19]  Iason Gabriel,et al.  Artificial Intelligence, Values, and Alignment , 2020, Minds and Machines.

[20]  Miklós Pintér,et al.  The Shapley value for airport and irrigation games , 2011 .

[21]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[22]  Stephen J. Roberts,et al.  Optimising Worlds to Evaluate and Influence Reinforcement Learning Agents , 2019, AAMAS.

[23]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[24]  Lawrence V. Snyder,et al.  A Deep Q-Network for the Beer Game with Partial Information , 2017, ArXiv.

[25]  Yuandong Tian,et al.  M^3RL: Mind-aware Multi-agent Management Reinforcement Learning , 2018, ICLR.

[26]  Thore Graepel,et al.  A Neural Architecture for Designing Truthful and Efficient Auctions , 2019, ArXiv.

[27]  Maria-Florina Balcan,et al.  Mechanism design via machine learning , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[28]  R. Trivers The Evolution of Reciprocal Altruism , 1971, The Quarterly Review of Biology.

[29]  Michael P. Wellman Methods for Empirical Game-Theoretic Analysis , 2006, AAAI.

[30]  G. Brady Governing the Commons: The Evolution of Institutions for Collective Action , 1993 .

[31]  Pingzhong Tang,et al.  Reinforcement mechanism design , 2017, IJCAI.

[32]  Lloyd S. Shapley,et al.  Notes on the n-Person Game — II: The Value of an n-Person Game , 1951 .

[33]  Doina Precup,et al.  Gifting in Multi-Agent Reinforcement Learning (Student Abstract) , 2020, AAAI.

[34]  Kagan Tumer,et al.  Collective Intelligence, Data Routing and Braess' Paradox , 2002, J. Artif. Intell. Res..

[35]  Nancy J. Hirschmann,et al.  Moral Boundaries: A Political Argument for an Ethic of Care. By Joan Tronto. New York: Routledge, 1993. 242 pp. $173.00 (hardcover), $53.95 (paperback). , 1993, Politics & Gender.

[36]  E. Ostrom A General Framework for Analyzing Sustainability of Social-Ecological Systems , 2009, Science.

[37]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[38]  J. Forrester Industrial Dynamics: A Major Breakthrough for Decision Makers , 2012 .

[39]  Özlem Ergun,et al.  Mechanism design for a multicommodity flow game in service network alliances , 2008, Oper. Res. Lett..

[40]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[41]  M. Jackson,et al.  Games on Networks , 2012 .

[42]  Alexander Peysakhovich,et al.  Consequentialist conditional cooperation in social dilemmas with imperfect information , 2017, AAAI Workshops.

[43]  Vincent Conitzer,et al.  Complexity of Mechanism Design , 2002, UAI.

[44]  P. Klemperer What Really Matters in Auction Design , 2001 .

[45]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[46]  Dan W. Brockt,et al.  The Theory of Justice , 2017 .

[47]  Nikolaos Tziortziotis,et al.  Reinforcement learning for supply chain optimization , 2018 .

[48]  Ioannis P. Vlahavas,et al.  Learning to Teach Reinforcement Learning Agents , 2017, Mach. Learn. Knowl. Extr..

[49]  Tom Eccles,et al.  Learning Reciprocity in Complex Sequential Social Dilemmas , 2019, ArXiv.

[50]  Malte Risto,et al.  The social behavior of autonomous vehicles , 2016, UbiComp Adjunct.