论文信息 - Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences

Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences

Game theoretic views of convention generally rest on notions of common knowledge and hyper-rational models of individual behavior. However, decades of work in behavioral economics have questioned the validity of both foundations. Meanwhile, computational neuroscience has contributed a modernized 'dual process' account of decision-making where model-free (MF) reinforcement learning trades off with model-based (MB) reinforcement learning. The former captures habitual and procedural learning while the latter captures choices taken via explicit planning and deduction. Some conventions (e.g. international treaties) are likely supported by cognition that resonates with the game theoretic and MB accounts. However, convention formation may also occur via MF mechanisms like habit learning; though this possibility has been understudied. Here, we demonstrate that complex, large-scale conventions can emerge from MF learning mechanisms. This suggests that some conventions may be supported by habit-like cognition rather than explicit reasoning. We apply MF multi-agent reinforcement learning to a temporo-spatially extended game with incomplete information. In this game, large parts of the state space are reachable only by collective action. However, heterogeneity of tastes makes such coordinated action difficult: multiple equilibria are desirable for all players, but subgroups prefer a particular equilibrium over all others. This creates a coordination problem that can be solved by establishing a convention. We investigate start-up and free rider subproblems as well as the effects of group size, intensity of intrinsic preference, and salience on the emergence dynamics of coordination conventions. Results of our simulations show agents establish and switch between conventions, even working against their own preferred outcome when doing so is necessary for effective coordination.

[1] Nathan Griffiths,et al. Limited Observations and Local Information in Convention Emergence , 2017, AAMAS.

[2] T. Koopmans,et al. Activity Analysis of Production and Allocation. , 1952 .

[3] Jordi Delgado,et al. Emergence of social conventions in complex networks , 2002, Artif. Intell..

[4] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[5] P. Dayan,et al. Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control , 2005, Nature Neuroscience.

[6] Joel Z. Leibo,et al. Social Diversity and Social Preferences in Mixed-Motive Reinforcement Learning , 2020, AAMAS.

[7] Peter J. Richerson,et al. Collective action and the evolution of social norm internalization , 2017, Proceedings of the National Academy of Sciences.

[8] A. Graybiel. Habits, rituals, and the evaluative brain. , 2008, Annual review of neuroscience.

[9] E. Kalai,et al. Rational Learning Leads to Nash Equilibrium , 1993 .

[10] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[11] Joel Z. Leibo,et al. Malthusian Reinforcement Learning , 2018, AAMAS.

[12] Nando de Freitas,et al. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning , 2018, ICML.

[13] C. Bicchieri. The grammar of society: the nature and dynamics of social norms , 2005 .

[14] Jeffrey Evans Stake,et al. The property 'instinct'. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[15] D. M. V. Hesteren. Evolutionary Game Theory , 2017 .

[16] Arshad Jhumka,et al. Manipulating convention emergence using influencer agents , 2013, Autonomous Agents and Multi-Agent Systems.

[17] Herbert Gintis,et al. The evolution of private property , 2007 .

[18] D. Snidal. Coordination versus Prisoners' Dilemma: Implications for International Cooperation and Regimes , 1985, American Political Science Review.

[19] Bruce Bueno de Mesquita,et al. An Introduction to Game Theory , 2014 .

[20] Thomas P. Lyon,et al. Astroturf: Interest Group Lobbying and Corporate Strategy , 2004 .

[21] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[22] D. Heckathorn. The dynamics and dilemmas of collective action , 1996 .

[23] Zheng Wen,et al. Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..

[24] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[25] N. Le Fort-Piat,et al. The world of independent learners is not markovian , 2011, Int. J. Knowl. Based Intell. Eng. Syst..

[26] Shane Legg,et al. Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents , 2018, ArXiv.

[27] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[28] H. Gintis,et al. Costly signaling and cooperation. , 2001, Journal of theoretical biology.

[29] K. Arrow. The Economic Implications of Learning by Doing , 1962 .

[30] T. Schelling. Micromotives and Macrobehavior , 1978 .

[31] Tom Eccles,et al. Learning Reciprocity in Complex Sequential Social Dilemmas , 2019, ArXiv.

[32] Peter Dayan,et al. Goal-directed control and its antipodes , 2009, Neural Networks.

[33] L. Medina,et al. A Unified Theory of Collective Action and Social Change , 2007 .

[34] Michael W. Macy,et al. Backward-looking social control , 1993 .

[35] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[36] Shinya Obayashi,et al. Self-organizing collective action: group dynamics by collective reputation , 2018, The Journal of Mathematical Sociology.

[37] Joel Z. Leibo,et al. A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[38] John C. Harsanyi,et al. Games with Incomplete Information Played by "Bayesian" Players, I-III: Part I. The Basic Model& , 2004, Manag. Sci..

[39] Mark S. Granovetter. Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[40] Colin Camerer. Behavioral Game Theory: Experiments in Strategic Interaction , 2003 .

[41] Moshe Tennenholtz,et al. On the Emergence of Social Conventions: Modeling, Analysis, and Simulations , 1997, Artif. Intell..

[42] Marcel Herbst. The Bounds of Reason: Game Theory and the Unification of the Behavioral Sciences , 2013 .

[43] L. V. Allis,et al. Searching for solutions in games and artificial intelligence , 1994 .

[44] R. Kirk. CONVENTION: A PHILOSOPHICAL STUDY , 1970 .

[45] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[46] J. Schreiber. Foundations Of Statistics , 2016 .

[47] Alexander Peysakhovich,et al. Robust Multi-agent Counterfactual Prediction , 2019, NeurIPS.

[48] Herbert Gintis,et al. Social norms as choreography , 2010 .

[49] Alexander Peysakhovich,et al. Reinforcement Learning and Inverse Reinforcement Learning with System 1 and System 2 , 2018, AIES.

[50] Joshua B. Tenenbaum,et al. Theory of Minds: Understanding Behavior in Groups Through Inverse Planning , 2019, AAAI.

[51] E. Ostrom. A Behavioral Approach to the Rational Choice Theory of Collective Action: Presidential Address, American Political Science Association, 1997 , 1998, American Political Science Review.

[52] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[53] Douglas D. Heckathorn,et al. Collective Action and Group Heterogeneity: Voluntary Provision Versus Selective Incentives , 1993 .

[54] Dylan Hadfield-Menell,et al. Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors , 2020, AAMAS.

[55] P. Dayan,et al. Decision theory, reinforcement learning, and the brain , 2008, Cognitive, affective & behavioral neuroscience.

[56] Anne G. E. Collins,et al. Beyond dichotomies in reinforcement learning , 2020, Nature reviews. Neuroscience.

[57] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[58] Peter Dayan,et al. Simple Plans or Sophisticated Habits? State, Transition and Learning Interactions in the Two-Step Task , 2015, bioRxiv.

[59] Edward T. Walker,et al. The Political Mobilization of Firms and Industries , 2014 .

[60] G. Marwell,et al. The critical mass in collective action , 1993 .

[61] Daniel Taylor,et al. Evolution of the social contract , 2014 .

[62] A. Dickinson. Actions and habits: the development of behavioural autonomy , 1985 .

[63] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[64] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[65] Ken Binmore,et al. Rational Decisions in Large Worlds. , 2007 .

[66] J. Harsanyi. Games with Incomplete Information Played by “Bayesian” Players Part II. Bayesian Equilibrium Points , 1968 .

[67] Alexander Peysakhovich,et al. Learning Existing Social Conventions via Observationally Augmented Self-Play , 2018, AIES.

[68] Joshua M. Epstein,et al. Learning to Be Thoughtless: Social Norms and Individual Computation , 2001 .

[69] T. L. Schwartz. The Logic of Collective Action , 1986 .

[70] G. Peterson. A day of great illumination: B. F. Skinner's discovery of shaping. , 2004, Journal of the experimental analysis of behavior.

[71] Sandip Sen,et al. Emergence of Norms through Social Learning , 2007, IJCAI.

[72] Joel Z. Leibo,et al. Learning invariant representations and applications to face verification , 2013, NIPS.

[73] Robert Sugden,et al. Peter Vanderschraaf, Strategic Justice: Convention and Problems of Balancing Divergent Interests , 2019, OEconomia.

[74] T. Schelling,et al. The Strategy of Conflict. , 1961 .

[75] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[76] S. Vajda,et al. GAMES AND DECISIONS; INTRODUCTION AND CRITICAL SURVEY. , 1958 .

[77] Peter Dayan,et al. Interplay of approximate planning strategies , 2015, Proceedings of the National Academy of Sciences.

[78] David C. Parkes,et al. The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies , 2020, ArXiv.

[79] Zeb Kurth-Nelson,et al. Model-Based Reasoning in Humans Becomes Automatic with Training , 2015, PLoS Comput. Biol..

[80] David Hume. A Treatise of Human Nature: Being an Attempt to introduce the experimental Method of Reasoning into Moral Subjects , 1972 .

[81] M. Olson,et al. The Logic of Collective Action , 1965 .

[82] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[83] H. Young. The Economics of Convention , 1996 .

[84] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[85] Jörgen W. Weibull,et al. Evolutionary Game Theory , 1996 .

[86] Peter Vanderschraaf,et al. Joint Beliefs in Conflictual Coordination Games , 1997 .

[87] J. Haidt. The emotional dog and its rational tail: a social intuitionist approach to moral judgment. , 2001, Psychological review.

[88] P. Dayan,et al. Model-based influences on humans’ choices and striatal prediction errors , 2011, Neuron.