论文信息 - Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors

Silly rules improve the capacity of agents to learn stable enforcement and compliance behaviors

How can societies learn to enforce and comply with social norms? Here we investigate the learning dynamics and emergence of compliance and enforcement of social norms in a foraging game, implemented in a multi-agent reinforcement learning setting. In this spatiotemporally extended game, individuals are incentivized to implement complex berry-foraging policies and punish transgressions against social taboos covering specific berry types. We show that agents benefit when eating poisonous berries is taboo, meaning the behavior is punished by other agents, as this helps overcome a credit-assignment problem in discovering delayed health effects. Critically, however, we also show that introducing an additional taboo, which results in punishment for eating a harmless berry, improves the rate and stability with which agents learn to punish taboo violations and comply with taboos. Counterintuitively, our results show that an arbitrary taboo (a "silly rule") can enhance social learning dynamics and achieve better outcomes in the middle stages of learning. We discuss the results in the context of studying normativity as a group-level emergent phenomenon.

[1] Michael Luck,et al. Cooperation Emergence under Resource-Constrained Peer Punishment , 2016, AAMAS.

[2] P. Richerson,et al. Punishment allows the evolution of cooperation (or anything else) in sizable groups , 1992 .

[3] V. Meyer-Rochow,et al. Journal of Ethnobiology and Ethnomedicine Open Access Food Taboos: Their Origins and Purposes , 2022 .

[4] M. Tomasello,et al. Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligence Hypothesis , 2007, Science.

[5] Paola Giuliano,et al. Understanding Cultural Persistence and Change , 2017, The Review of Economic Studies.

[6] Robert Boyd,et al. Third-party monitoring and sanctions aid the evolution of language , 2015 .

[7] Michael Wooldridge,et al. Automated synthesis of normative systems , 2013, AAMAS.

[8] Alexander Peysakhovich,et al. Prosocial Learning Agents Solve Generalized Stag Hunts Better than Selfish Ones Extended Abstract , 2018 .

[9] Joel Z. Leibo,et al. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[10] Robert Boyd,et al. Causal understanding is not necessary for the improvement of culturally evolving technology , 2018, Nature Human Behaviour.

[11] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[12] Joel Z. Leibo,et al. A multi-agent reinforcement learning model of common-pool resource appropriation , 2017, NIPS.

[13] P. Wiessner. Norm enforcement among the Ju/’hoansi Bushmen , 2005, Human nature.

[14] Marina De Vos,et al. Norm emergence in multiagent systems: a viewpoint paper , 2019, Autonomous Agents and Multi-Agent Systems.

[15] Shimon Whiteson,et al. Learning with Opponent-Learning Awareness , 2017, AAMAS.

[16] Hui Guo,et al. Robust Norm Emergence by Revealing and Reasoning about Context: Socially Intelligent Agents for Enhancing Privacy , 2018, IJCAI.

[17] Gillian K. Hadfield,et al. What Is Law? A Coordination Model of the Characteristics of Legal Order , 2012 .

[18] Gillian K. Hadfield,et al. Law without the State , 2013, Journal of Law and Courts.

[19] C. Boehm,et al. Moral Origins: The Evolution of Virtue, Altruism, and Shame , 2012 .

[20] R. Boyd,et al. The evolution of altruistic punishment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[21] Sandip Sen,et al. Modeling the Emergence and Convergence of Norms , 2011, IJCAI.

[22] Joel Z. Leibo,et al. Evolving intrinsic motivations for altruistic behavior , 2018, AAMAS.

[23] L. Shapley,et al. Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[24] Gillian K. Hadfield,et al. Microfoundations of the Rule of Law , 2013 .

[25] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[26] Daniel M. T. Fessler,et al. Meat Is Good to Taboo: Dietary Proscriptions as a Product of the Interaction of Psychological Mechanisms and Social Processes , 2003 .

[27] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[28] Alexander Peysakhovich,et al. Maintaining cooperation in complex social dilemmas using deep reinforcement learning , 2017, ArXiv.

[29] Joel Z. Leibo,et al. Inequity aversion improves cooperation in intertemporal social dilemmas , 2018, NeurIPS.

[30] Maxime Derex,et al. Experimental evidence for the influence of group size on cultural complexity , 2013, Nature.

[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32] R. Boyd,et al. Coordinated Punishment of Defectors Sustains Cooperation and Can Proliferate When Rare , 2010, Science.

[33] J. Henrich. Demography and Cultural Evolution: How Adaptive Cultural Processes Can Produce Maladaptive Losses—The Tasmanian Case , 2004, American Antiquity.

[34] Matthijs van Veelen,et al. Human Cooperation among Kin and Close Associates May Require Enforcement of Norms by Third Parties , 2013 .

[35] Joshua B. Tenenbaum,et al. Coordinate to cooperate or compete: Abstract goals and joint intentions in social interaction , 2016, CogSci.

[36] M. Tomasello,et al. Origins of human cooperation and morality. , 2013, Annual review of psychology.

[37] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[38] J. Henrich,et al. The cultural niche: Why social learning is essential for human adaptation , 2011, Proceedings of the National Academy of Sciences.

[39] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[40] Siddhartha S. Srinivasa,et al. Legibility and predictability of robot motion , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[41] Michael Tomasello,et al. No third-party punishment in chimpanzees , 2012, Proceedings of the National Academy of Sciences.

[42] Dylan Hadfield-Menell,et al. Legible Normativity for AI Alignment: The Value of Silly Rules , 2018, AIES.

[43] Ernst Fehr,et al. Third Party Punishment and Social Norms , 2004 .

[44] Adam Powell,et al. Late Pleistocene Demography and the Appearance of Modern Human Behavior , 2009, Science.

[45] G. Brady. Governing the Commons: The Evolution of Institutions for Collective Action , 1993 .

[46] Alexander Peysakhovich,et al. Consequentialist conditional cooperation in social dilemmas with imperfect information , 2017, AAAI Workshops.

[47] W. Hamilton,et al. The Evolution of Cooperation , 1984 .