论文信息 - Multi-Agent Coordination in Adversarial Environments through Signal Mediated Strategies

Multi-Agent Coordination in Adversarial Environments through Signal Mediated Strategies

Many real-world scenarios involve teams of agents that have to coordinate their actions to reach a shared goal. We focus on the setting in which a team of agents faces an opponent in a zero-sum, imperfect-information game. Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game. This is the case, for example, in Bridge, collusion in poker, and collusion in bidding. In this setting, model-free RL methods are oftentimes unable to capture coordination because agents’ policies are executed in a decentralized fashion. Our first contribution is a game-theoretic centralized training regimen to effectively perform trajectory sampling so as to foster team coordination. When team members can observe each other actions, we show that this approach provably yields equilibrium strategies. Then, we introduce a signaling-based framework to represent team coordinated strategies given a buffer of past experiences. Each team member’s policy is parametrized as a neural network whose output is conditioned on a suitable exogenous signal, drawn from a learned probability distribution. By combining these two elements, we empirically show convergence to coordinated equilibria in cases where previous state-of-the-art multi-agent RL algorithms did not.

[1] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[2] Nicola Gatti,et al. Computational Results for Extensive-Form Adversarial Team Games , 2017, AAAI.

[3] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[4] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[5] Nicola Basilico,et al. Team-Maxmin Equilibrium: Efficiency Bounds and Algorithms , 2016, AAAI.

[6] Nicola Gatti,et al. Learning to Correlate in Multi-Player General-Sum Sequential Games , 2019, NeurIPS.

[7] Shimon Whiteson,et al. Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[8] O. H. Brownlee,et al. ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[9] Guillaume J. Laurent,et al. Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11] M. Kaneko,et al. Behavior strategies, mixed strategies and perfect recall , 1995 .

[12] Guillaume J. Laurent,et al. A study of FMQ heuristic in cooperative multi-agent games , 2008, AAMAS 2008.

[13] Noam Brown,et al. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[14] International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[15] Nicola Gatti,et al. Personality-Based Representations of Imperfect-Recall Games , 2019, AAMAS.

[16] J. Jude Kline,et al. Minimum Memory for Equivalence between Ex Ante Optimality and Time-Consistency , 2002, Games Econ. Behav..

[17] Rahul Savani,et al. Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[18] Guillaume J. Laurent,et al. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[19] David Silver,et al. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[20] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[21] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[22] Martin Lauer,et al. Reinforcement learning for stochastic cooperative multi-agent-systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[23] Martin Lauer,et al. An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[24] S. Ross. GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[25] Akira Okada. Complete inflation and perfect recall in extensive games , 1987 .

[26] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[27] J. Robinson. AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[28] D. Koller,et al. Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[29] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30] Tuomas Sandholm,et al. Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker , 2007, AAMAS '07.

[31] Victor R. Lesser,et al. Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[32] L. Shapley,et al. Fictitious Play Property for Games with Identical Interests , 1996 .

[33] Guy Lever,et al. Emergent Coordination Through Competition , 2019, ICLR.

[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.