Consequentialist conditional cooperation in social dilemmas with imperfect information

Social dilemmas, where mutual cooperation can lead to high payoffs but participants face incentives to cheat, are ubiquitous in multi-agent interaction. We wish to construct agents that cooperate with pure cooperators, avoid exploitation by pure defectors, and incentivize cooperation from the rest. However, often the actions taken by a partner are (partially) unobserved or the consequences of individual actions are hard to predict. We show that in a large class of games good strategies can be constructed by conditioning one's behavior solely on outcomes (ie. one's past rewards). We call this consequentialist conditional cooperation. We show how to construct such strategies using deep reinforcement learning techniques and demonstrate, both analytically and experimentally, that they are effective in social dilemmas beyond simple matrix games. We also show the limitations of relying purely on consequences and discuss the need for understanding both the consequences of and the intentions behind an action.

[1]  Santiago Ontañón,et al.  A Survey of Real-Time Strategy Game AI Research and Competition in StarCraft , 2013, IEEE Transactions on Computational Intelligence and AI in Games.

[2]  Peter Stone,et al.  A polynomial-time nash equilibrium algorithm for repeated games , 2003, EC '03.

[3]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[4]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[5]  Stefan Lee,et al.  Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Yoav Shoham,et al.  If multi-agent learning is the answer, what is the question? , 2007, Artif. Intell..

[7]  D. Fudenberg,et al.  Digitized by the Internet Archive in 2011 with Funding from Working Paper Department of Economics the Folk Theorem with Imperfect Public Information , 2022 .

[8]  Michael L. Littman,et al.  A Polynomial-time Nash Equilibrium Algorithm for Repeated Stochastic Games , 2008, UAI.

[9]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[10]  Kyunghyun Cho,et al.  Emergent Language in a Multi-Modal, Multi-Step Referential Game , 2017, ArXiv.

[11]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[12]  Kyunghyun Cho,et al.  Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[13]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[14]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[15]  Jeffrey C. Ely,et al.  Belief-free Equilibria in Repeated Games , 2005 .

[16]  A. Peysakhovich,et al.  When Punishment Doesn't Pay: 'Cold Glow' and Decisions to Punish , 2015 .

[17]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[18]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[19]  Drew Fudenberg,et al.  The Folk Theorem in Repeated Games with Discounting or with Incomplete Information , 1986 .

[20]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[21]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[22]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[23]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24]  Drew Fudenberg,et al.  ‘I’m Just a Soul Whose Intentions Are Good’: The Role of Communication in Noisy Repeated Games , 2017, Games Econ. Behav..

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Vincent Conitzer,et al.  AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents , 2003, Machine Learning.

[27]  Iyad Rahwan,et al.  The social dilemma of autonomous vehicles , 2015, Science.

[28]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[29]  F. Cushman,et al.  Accidental Outcomes Guide Punishment in a “Trembling Hand” Game , 2009, PloS one.

[30]  Iyad Rahwan,et al.  Cooperating with machines , 2017, Nature Communications.

[31]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[32]  Michael L. Littman,et al.  Social reward shaping in the prisoner's dilemma , 2008, AAMAS.

[33]  Joshua D. Greene Moral Tribes: Emotion, Reason, and the Gap Between Us and Them , 2001 .

[34]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[35]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[36]  E. Stacchetti,et al.  Towards a Theory of Discounted Repeated Games with Imperfect Monitoring , 1990 .

[37]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[38]  Nicholas A. Christakis,et al.  Locally noisy autonomous agents improve global human coordination in network experiments , 2017, Nature.

[39]  Siddharth Suri,et al.  Resilient cooperators stabilize long-run cooperation in the finitely repeated Prisoner's Dilemma , 2016, Nature Communications.

[40]  D. Fudenberg,et al.  The Theory of Learning in Games , 1998 .

[41]  Shimon Whiteson,et al.  Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning , 2017, ICML.

[42]  R. Porter,et al.  NONCOOPERATIVE COLLUSION UNDER IMPERFECT PRICE INFORMATION , 1984 .

[43]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[44]  E. Fehr A Theory of Fairness, Competition and Cooperation , 1998 .

[45]  Tamara Niella,et al.  Nudging Cooperation in a Crowd Experiment , 2016, PloS one.

[46]  Scott Duke Kominers,et al.  Information can wreck cooperation: A counterpoint to Kandori (1992) , 2010 .

[47]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[48]  David G. Rand,et al.  Social heuristics shape intuitive cooperation , 2014, Nature Communications.

[49]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[50]  S. Zamir,et al.  Bargaining and Market Behavior in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: An Experimental Study , 1991 .

[51]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[52]  Martin A. Riedmiller,et al.  Reinforcement learning for robot soccer , 2009, Auton. Robots.

[53]  David G. Rand,et al.  Cooperating with the future , 2014, Nature.

[54]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[55]  Emil Gustavsson,et al.  Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.

[56]  R. Axelrod,et al.  Evolutionary Dynamics , 2004 .

[57]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[58]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.