Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents' behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area.

[1]  Mikhail Prokopenko,et al.  Differentiating information transfer and causal effect , 2010 .

[2]  Alexander Peysakhovich,et al.  Consequentialist conditional cooperation in social dilemmas with imperfect information , 2018, AAAI Workshops.

[3]  Sam Devlin,et al.  Potential-based difference rewards for multiagent reinforcement learning , 2014, AAMAS.

[4]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[5]  Philippe Capdepuy,et al.  Maximization of Potential Information Flow as a Universal Utility for Collective Behaviour , 2007, 2007 IEEE Symposium on Artificial Life.

[6]  K. Laland Darwin's Unfinished Symphony: How Culture Made the Human Mind , 2017 .

[7]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[8]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[9]  Ana Paiva,et al.  Emerging social awareness: Exploring intrinsic motivation in multiagent learning , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[10]  Toru Yanagawa,et al.  Untangling Brain-Wide Dynamics in Consciousness by Cross-Embedding , 2015, PLoS Comput. Biol..

[11]  A. Melis,et al.  How is human cooperation different? , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[12]  Nando de Freitas,et al.  Compositional Obverter Communication Learning From Raw Visual Input , 2018, ICLR.

[13]  Judith M Burkart,et al.  Social learning and evolution: the cultural intelligence hypothesis , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[14]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[15]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[17]  Dirk Lindebaum Sapiens: A Brief History of Humankind - A Review , 2015 .

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[20]  L. Carver,et al.  Research review: Social motivation and oxytocin in autism--implications for joint attention development and intervention. , 2013, Journal of child psychology and psychiatry, and allied disciplines.

[21]  A. Sanford,et al.  Expectations in counterfactual and theory of mind reasoning , 2010 .

[22]  Stephen Clark,et al.  Emergence of Linguistic Communication from Referential Games with Symbolic and Pixel Input , 2018, ICLR.

[23]  Pierre-Yves Oudeyer,et al.  Discovering communication , 2006, Connect. Sci..

[24]  Joshua B. Tenenbaum,et al.  Learning to Share and Hide Intentions using Information Regularization , 2018, NeurIPS.

[25]  M. Tomasello,et al.  Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligence Hypothesis , 2007, Science.

[26]  Nicholas R. Waytowich,et al.  Measuring collaborative emergent behavior in multi-agent reinforcement learning , 2018, IHSED.

[27]  Pierre-Yves Oudeyer,et al.  How Evolution May Work Through Curiosity-Driven Developmental Process , 2016, Top. Cogn. Sci..

[28]  Judea Pearl Structural Counterfactuals: A Brief Introduction , 2013, Cogn. Sci..

[29]  Minjie Zhang,et al.  Emotional Multiagent Reinforcement Learning in Social Dilemmas , 2013, PRIMA.

[30]  K. Frisch The dance language and orientation of bees , 1967 .

[31]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[32]  J. Henrich The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter , 2015 .

[33]  Richard L. Lewis,et al.  Optimal Rewards for Cooperative Agents , 2014, IEEE Transactions on Autonomous Mental Development.

[34]  Stephen Clark,et al.  Emergent Communication through Negotiation , 2018, ICLR.

[35]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[36]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[37]  M. Tomasello Why We Cooperate , 2009 .

[38]  Alexander Peysakhovich,et al.  Prosocial learning agents solve generalized Stag Hunts better than selfish ones , 2017, AAMAS.

[39]  Jonathan Berant,et al.  Emergence of Communication in an Interactive World with Consistent Speakers , 2018, ArXiv.

[40]  Julian Togelius,et al.  New And Surprising Ways to Be Mean. Adversarial NPCs with Coupled Empowerment Minimisation , 2018, ArXiv.

[41]  Pierre-Yves Oudeyer,et al.  A Unified Model of Speech and Tool Use Early Development , 2017, CogSci.

[42]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[43]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[44]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[45]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[46]  T. Schelling Hockey Helmets, Concealed Weapons, and Daylight Saving , 1973 .

[47]  J. Sobel,et al.  STRATEGIC INFORMATION TRANSMISSION , 1982 .