Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning

Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity. In contrast, studies of reinforcement learning in mixed-motive games have primarily leveraged homogeneous approaches. Given the defining characteristic of mixed-motive games--the imperfect correlation of incentives between group members--we study the effect of population heterogeneity on mixed-motive reinforcement learning. We draw on interdependence theory from social psychology and imbue reinforcement learning agents with Social Value Orientation (SVO), a flexible formalization of preferences over group outcome distributions. We subsequently explore the effects of diversity in SVO on populations of reinforcement learning agents in two mixed-motive Markov games. We demonstrate that heterogeneity in SVO generates meaningful and complex behavioral variation among agents similar to that suggested by interdependence theory. Empirical results in these mixed-motive dilemmas suggest agents trained in heterogeneous populations develop particularly generalized, high-performing policies relative to those trained in homogeneous populations.

[1]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[2]  Jakub W. Pachocki,et al.  Emergent Complexity via Multi-Agent Competition , 2017, ICLR.

[3]  R. Nisbett,et al.  Perception of social distributions. , 1985, Journal of personality and social psychology.

[4]  John Fender,et al.  Altruism and Economics , 2012 .

[5]  Alexander Peysakhovich,et al.  Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[6]  Iyad Rahwan,et al.  Behavioural evidence for a transparency–efficiency tradeoff in human–machine cooperation , 2019, Nature Machine Intelligence.

[7]  Catherine C. Eckel,et al.  Altruism in Anonymous Dictator Games , 1996 .

[8]  Joseph Henrich,et al.  Cooperation, Punishment, and the Evolution of Human Institutions , 2006, Science.

[9]  G. Hardin,et al.  Tragedy of the Commons , 1968 .

[10]  Margaret Morrison,et al.  Models as Mediating Instruments , 1999 .

[11]  Nando de Freitas,et al.  Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[12]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[13]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[14]  Patrick Riordan Blueprint: The Evolutionary Origins of a Good Society. By Nicholas A.Christakis. Pp. xxi, 520, NY, Boston, London, Little, Brown Spark, 2019, $30.00. , 2019 .

[15]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[16]  Ryan O. Murphy,et al.  Social Value Orientation , 2014, Personality and social psychology review : an official journal of the Society for Personality and Social Psychology, Inc.

[17]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[18]  A. Turrell,et al.  Drawing on different disciplines: macroeconomic agent-based models , 2019 .

[19]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[20]  M. Brewer In-group bias in the minimal intergroup situation: A cognitive-motivational analysis. , 1979 .

[21]  Joel Z. Leibo,et al.  Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[22]  Alborz Geramifard,et al.  Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[24]  Elinor Ostrom,et al.  The Nature of Common-Pool Resource Problems , 1990 .

[25]  J. Hartley,et al.  Retrospectives: The Origins of the Representative Agent , 1996 .

[26]  S. Bowles Group Competition, Reproductive Leveling, and the Evolution of Human Altruism , 2006, Science.

[27]  A. Kirman Whom Or What Does the Representative Individual Represent , 1992 .

[28]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[29]  Alexander Peysakhovich,et al.  Consequentialist conditional cooperation in social dilemmas with imperfect information , 2017, AAAI Workshops.

[30]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[31]  J. Turner Social categorization and the self-concept: A social cognitive theory of group behavior. , 2010 .

[32]  Jonathan P. How,et al.  Decentralized control of partially observable Markov decision processes , 2015, 52nd IEEE Conference on Decision and Control.

[33]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[34]  H. Francis Song,et al.  The Hanabi Challenge: A New Frontier for AI Research , 2019, Artif. Intell..

[35]  P. P. Shenoy,et al.  On coalition formation: a game-theoretical approach , 1979 .

[36]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[37]  Joel Z. Leibo,et al.  A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[38]  J. P. Rushton,et al.  The altruistic personality and the self-report altruism scale , 1981 .

[39]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[40]  Anca D. Dragan,et al.  On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[41]  J. Kagel,et al.  Other Regarding Preferences: A Selective Survey of Experimental Results , 2012 .

[42]  Bharat Maldé,et al.  Interpersonal relations — a theory of interdependence , 1979 .

[43]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[44]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[45]  Alexander Peysakhovich,et al.  Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[46]  Tom Eccles,et al.  The Imitation Game: Learned Reciprocity in Markov games , 2019, AAMAS.

[47]  Joseph Henrich,et al.  Constraining free riding in public goods games: designated solitary punishers can sustain human cooperation , 2009, Proceedings of the Royal Society B: Biological Sciences.

[48]  D. Wilson,et al.  Population viscosity and the evolution of altruism. , 2000, Journal of theoretical biology.

[49]  P. Nonacs,et al.  Social heterosis and the maintenance of genetic diversity , 2007, Journal of evolutionary biology.

[50]  Joel Z. Leibo,et al.  Evolving intrinsic motivations for altruistic behavior , 2018, AAMAS.

[51]  T. Blakely,et al.  What is the difference between controlling for mean versus median income in analyses of income inequality? , 2001, Journal of epidemiology and community health.

[52]  C. Parks,et al.  Social Value Orientation and Cooperation in Social Dilemmas: A Meta-Analysis , 2009 .

[53]  Robert H. Crites,et al.  Multiagent reinforcement learning in the Iterated Prisoner's Dilemma. , 1996, Bio Systems.

[54]  Carsten K. W. De Dreu,et al.  Social value orientation moderates ingroup love but not outgroup hate in competitive intergroup conflict , 2010 .

[55]  T. Schelling,et al.  The Strategy of Conflict. , 1961 .