Learning Existing Social Conventions in Markov Games

In order for artificial agents to coordinate effectively with people, they must act consistently with existing conventions (e.g. how to navigate in traffic, which language to speak, or how to work with teammates). A group's conventions can be viewed as a choice of equilibrium in a coordination game. We consider the problem of an agent learning a policy for a coordination game in a simulated environment and then using this policy when it enters an existing group. When there are multiple possible conventions we show that learning a policy via multi-agent reinforcement learning (MARL) is likely to find policies which achieve high payoffs at training time but fail to coordinate with the real group into which the agent enters. We assume access to a small number of samples of behavior from the true convention and show that we can augment the MARL objective to help it find policies consistent with the real group's convention. In three environments from the literature - traffic, communication, and team coordination - we observe that augmenting MARL with a small amount of imitation learning greatly increases the probability that the strategy found by MARL fits well with the existing social convention. We show that this works even in an environment where standard training methods very rarely find the true convention of the agent's partners.

[1]  Eric Wiewiora,et al.  Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..

[2]  Alexander Peysakhovich,et al.  Multi-Agent Cooperation and the Emergence of (Natural) Language , 2016, ICLR.

[3]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[4]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[8]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[9]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Matthieu Geist,et al.  Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.

[13]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[14]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[15]  Raymond J. Dolan,et al.  Game Theory of Mind , 2008, PLoS Comput. Biol..

[16]  Jason Weston,et al.  Vehicle Community Strategies , 2018, ArXiv.

[17]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[18]  Emil Gustavsson,et al.  Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence , 2016, ArXiv.

[19]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[20]  Pradeep Dubey,et al.  Strategic complements and substitutes, and potential games , 2006, Games Econ. Behav..

[21]  D. Fudenberg,et al.  Superstition and Rational Learning , 2006 .

[22]  Ivan Titov,et al.  Emergence of Language with Multi-agent Games: Learning to Communicate with Sequences of Symbols , 2017, NIPS.

[23]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[24]  Nicholas A. Christakis,et al.  Locally noisy autonomous agents improve global human coordination in network experiments , 2017, Nature.

[25]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[26]  D. Kudenko,et al.  Improving on the reinforcement learning of coordination in cooperative multi-agent systems , 2002 .

[27]  Rob Fergus,et al.  Modeling Others using Oneself in Multi-Agent Reinforcement Learning , 2018, ICML.

[28]  Peter Stone,et al.  Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..

[29]  Kyunghyun Cho,et al.  Emergent Language in a Multi-Modal, Multi-Step Referential Game , 2017, ArXiv.

[30]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[31]  Clément Moulin-Frier,et al.  Modeling the Formation of Social Conventions in Multi-Agent Populations , 2018, ArXiv.

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  J. Geanakoplos,et al.  Multimarket Oligopoly: Strategic Substitutes and Complements , 1985, Journal of Political Economy.

[34]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[35]  Drew Fudenberg,et al.  Recency, Records, and Recaps , 2016, TEAC.

[36]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[37]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[38]  Sarit Kraus,et al.  Making friends on the fly: Cooperating with new teammates , 2017, Artif. Intell..

[39]  Joshua B. Tenenbaum,et al.  Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[40]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[41]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[42]  Michael L. Littman,et al.  Social reward shaping in the prisoner's dilemma , 2008, AAMAS.

[43]  Moshe Tennenholtz,et al.  On the Emergence of Social Conventions: Modeling, Analysis, and Simulations , 1997, Artif. Intell..

[44]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[45]  D. Fudenberg,et al.  Self-confirming equilibrium , 1993 .

[46]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[49]  Daniel Kudenko,et al.  Reinforcement learning of coordination in cooperative multi-agent systems , 2002, AAAI/IAAI.

[50]  Nando de Freitas,et al.  Compositional Obverter Communication Learning From Raw Visual Input , 2018, ICLR.

[51]  Eric Maskin,et al.  Markov Perfect Equilibrium: I. Observable Actions , 2001, J. Econ. Theory.

[52]  Joelle Pineau,et al.  Learning from Limited Demonstrations , 2013, NIPS.

[53]  MADHU SUDAN Universal Communication via Robust Coordination , 2013 .

[54]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[55]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[56]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.