Multi-Agent Coordination in Adversarial Environments through Signal Mediated Strategies

Many real-world scenarios involve teams of agents that have to coordinate their actions to reach a shared goal. We focus on the setting in which a team of agents faces an opponent in a zero-sum, imperfect-information game. Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game. This is the case, for example, in Bridge, collusion in poker, and collusion in bidding. In this setting, model-free RL methods are oftentimes unable to capture coordination because agents’ policies are executed in a decentralized fashion. Our first contribution is a game-theoretic centralized training regimen to effectively perform trajectory sampling so as to foster team coordination. When team members can observe each other actions, we show that this approach provably yields equilibrium strategies. Then, we introduce a signaling-based framework to represent team coordinated strategies given a buffer of past experiences. Each team member’s policy is parametrized as a neural network whose output is conditioned on a suitable exogenous signal, drawn from a learned probability distribution. By combining these two elements, we empirically show convergence to coordinated equilibria in cases where previous state-of-the-art multi-agent RL algorithms did not.

[1]  Sean Luke,et al.  Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[2]  Nicola Gatti,et al.  Computational Results for Extensive-Form Adversarial Team Games , 2017, AAAI.

[3]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Nicola Basilico,et al.  Team-Maxmin Equilibrium: Efficiency Bounds and Algorithms , 2016, AAAI.

[6]  Nicola Gatti,et al.  Learning to Correlate in Multi-Player General-Sum Sequential Games , 2019, NeurIPS.

[7]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[8]  O. H. Brownlee,et al.  ACTIVITY ANALYSIS OF PRODUCTION AND ALLOCATION , 1952 .

[9]  Guillaume J. Laurent,et al.  Hysteretic q-learning :an algorithm for decentralized reinforcement learning in cooperative multi-agent teams , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[11]  M. Kaneko,et al.  Behavior strategies, mixed strategies and perfect recall , 1995 .

[12]  Guillaume J. Laurent,et al.  A study of FMQ heuristic in cooperative multi-agent games , 2008, AAMAS 2008.

[13]  Noam Brown,et al.  Superhuman AI for heads-up no-limit poker: Libratus beats top professionals , 2018, Science.

[14]  International Foundation for Autonomous Agents and MultiAgent Systems ( IFAAMAS ) , 2007 .

[15]  Nicola Gatti,et al.  Personality-Based Representations of Imperfect-Recall Games , 2019, AAMAS.

[16]  J. Jude Kline,et al.  Minimum Memory for Equivalence between Ex Ante Optimality and Time-Consistency , 2002, Games Econ. Behav..

[17]  Rahul Savani,et al.  Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[18]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[19]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[20]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[21]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[22]  Martin Lauer,et al.  Reinforcement learning for stochastic cooperative multi-agent-systems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[23]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[24]  S. Ross GOOFSPIEL -- THE GAME OF PURE STRATEGY , 1971 .

[25]  Akira Okada Complete inflation and perfect recall in extensive games , 1987 .

[26]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[27]  J. Robinson AN ITERATIVE METHOD OF SOLVING A GAME , 1951, Classics in Game Theory.

[28]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  Tuomas Sandholm,et al.  Better automated abstraction techniques for imperfect information games, with application to Texas Hold'em poker , 2007, AAMAS '07.

[31]  Victor R. Lesser,et al.  Coordinating multi-agent reinforcement learning with limited communication , 2013, AAMAS.

[32]  L. Shapley,et al.  Fictitious Play Property for Games with Identical Interests , 1996 .

[33]  Guy Lever,et al.  Emergent Coordination Through Competition , 2019, ICLR.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  David S. Leslie,et al.  Generalised weakened fictitious play , 2006, Games Econ. Behav..

[36]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[37]  Manuela M. Veloso,et al.  Rational and Convergent Learning in Stochastic Games , 2001, IJCAI.

[38]  N. Dalkey EQUIVALENCE OF INFORMATION PATTERNS AND ESSENTIALLY DETERMINATE GAMES , 1952 .

[39]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[40]  K. Tuyls,et al.  Lenient Frequency Adjusted Q-learning , 2010 .

[41]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[42]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .

[43]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[44]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[45]  Tuomas Sandholm,et al.  The State of Solving Large Incomplete-Information Games, and Application to Poker , 2010, AI Mag..

[46]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[47]  Boi Faltings,et al.  Decentralized Anti-coordination Through Multi-agent Learning , 2013, J. Artif. Intell. Res..

[48]  Tuomas Sandholm,et al.  A Competitive Texas Hold'em Poker Player via Automated Abstraction and Real-Time Equilibrium Computation , 2006, AAAI.

[49]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[50]  Michael H. Bowling,et al.  No-Regret Learning in Extensive-Form Games with Imperfect Recall , 2012, ICML.

[51]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[52]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[53]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[54]  Boi Faltings,et al.  Reaching correlated equilibria through multi-agent learning , 2011, AAMAS.

[55]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[56]  Tuomas Sandholm,et al.  Deep Counterfactual Regret Minimization , 2018, ICML.

[57]  R. Aumann Subjectivity and Correlation in Randomized Strategies , 1974 .

[58]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[59]  Matthew E. Taylor,et al.  A survey and critique of multiagent deep reinforcement learning , 2019, Autonomous Agents and Multi-Agent Systems.

[60]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[61]  Tuomas Sandholm,et al.  Ex ante coordination and collusion in zero-sum multi-player extensive-form games , 2018, NeurIPS.

[62]  B. Stengel,et al.  Team-Maxmin Equilibria☆ , 1997 .

[63]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[64]  David Silver,et al.  Fictitious Self-Play in Extensive-Form Games , 2015, ICML.