论文信息 - RLupus: Cooperation through emergent communication in The Werewolf social deduction game

RLupus: Cooperation through emergent communication in The Werewolf social deduction game

This paper focuses on the emergence of communication to support cooperation in environments modeled as social deduction games (SDG), that are games where players communicate freely to deduce each others’ hidden intentions. We first state the problem by giving a general formalization of SDG and a possible solution framework based on reinforcement learning. Next, we focus on a specific SDG, known as The Werewolf, and study if and how various forms of communication influence the outcome of the game. Experimental results show that introducing a communication signal greatly increases the winning chances of a class of players. We also study the effect of the signal’s length and range on the overall performance showing a non-linear relationship.

[1] Wojciech M. Czarnecki,et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2] A. Colman. Cooperation, psychological game theory, and limitations of rationality in social interaction , 2003, Behavioral and Brain Sciences.

[3] Pieter Abbeel,et al. Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[4] Timothy W. Finin,et al. KQML as an agent communication language , 1994, CIKM '94.

[5] Bart De Schutter,et al. Multi-agent Reinforcement Learning: An Overview , 2010 .

[6] Hirotaka Osawa,et al. Constructing a Human-like agent for the Werewolf Game using a psychological model based multiple perspectives , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[7] M. R. Genesereth,et al. Knowledge Interchange Format Version 3.0 Reference Manual , 1992, LICS 1992.

[8] P. D. O'Brien,et al. FIPA — Towards a Standard for Software Agents , 1998 .

[9] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[10] Keith Decker,et al. Mixed-Autonomy Traffic Control with Proximal Policy Optimization , 2019, 2019 IEEE Vehicular Networking Conference (VNC).

[11] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[12] Tetsuro Tanaka,et al. Human-Side Strategies in the Werewolf Game Against the Stealth Werewolf Strategy , 2016, Computers and Games.

[13] Irwin King,et al. Mathematical Modeling of Social Games , 2009, 2009 International Conference on Computational Science and Engineering.

[14] Laura Graesser,et al. Emergent Linguistic Phenomena in Multi-Agent Communication Games , 2019, EMNLP.

[15] Yoav Shoham,et al. Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[16] Qi Sun,et al. Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization , 2020, IEEE Transactions on Vehicular Technology.

[17] Angeliki Lazaridou,et al. Emergent Multi-Agent Communication in the Deep Learning Era , 2020, ArXiv.

[18] Ronald Fagin,et al. Reasoning about knowledge , 1995 .

[19] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[20] Joelle Pineau,et al. On the Pitfalls of Measuring Emergent Communication , 2019, AAMAS.

[21] M. Kuperman,et al. Social games in a social network. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22] Dong-Jun Wu,et al. Software agents for knowledge management: coordination in multi-agent supply chains and auctions , 2001, Expert Syst. Appl..

[23] Yaoyiran Li,et al. Emergent Communication Pretraining for Few-Shot Machine Translation , 2020, COLING.

[24] Tomoyuki Kaneko,et al. Application of Deep Reinforcement Learning in Werewolf Game Agents , 2018, 2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI).

[25] Joel Z. Leibo,et al. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research , 2019, ArXiv.

[26] Igor Mordatch,et al. Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[27] Hirotaka Osawa,et al. Investigation of the effects of nonverbal information on werewolf , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[28] Hirotaka Osawa,et al. Werewolf Game Modeling Using Action Probabilities Based on Play Log Analysis , 2016, Computers and Games.

[29] James A. Reggia,et al. Progress in the Simulation of Emergent Communication and Language , 2003, Adapt. Behav..

[30] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[31] Traian Rebedea,et al. Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint Replay , 2016, ArXiv.

[32] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[33] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[34] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[35] Chris Martens,et al. Keeping the Story Straight: A Comparison of Commitment Strategies for a Social Deduction Game , 2018, AIIDE.

[36] Sarvapali D. Ramchurn,et al. Agent-based micro-storage management for the Smart Grid , 2010, AAMAS.

[37] George Legrady,et al. Visual Diagnostics for Deep Reinforcement Learning Policy Development , 2018, ArXiv.

[38] Makoto Yokoo,et al. Adopt: asynchronous distributed constraint optimization with quality guarantees , 2005, Artif. Intell..

[39] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[40] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[42] Stephen Clark,et al. Emergent Communication through Negotiation , 2018, ICLR.

[43] Sebastian Thrun,et al. Multi-robot SLAM with Sparse Extended Information Filers , 2003, ISRR.

[44] Ruslan Salakhutdinov,et al. On Emergent Communication in Competitive Multi-Agent Teams , 2020, AAMAS.

[45] Stergios I. Roumeliotis,et al. Multi-robot SLAM with Unknown Initial Correspondence: The Robot Rendezvous Case , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[47] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[48] Sarah Wiseman,et al. What Data do Players Rely on in Social Deduction Games? , 2019, CHI PLAY.

[49] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.

[50] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[51] José M. F. Moura,et al. Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog , 2017, EMNLP.

[52] Yingqian Zhang,et al. Multiagent task allocation in social networks , 2011, Autonomous Agents and Multi-Agent Systems.