Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition

In real-world multi-agent systems, agents with different capabilities may join or leave without altering the team's overarching goals. Coordinating teams with such dynamic composition is challenging: the optimal team strategy varies with the composition. We propose COPA, a coach-player framework to tackle this problem. We assume the coach has a global view of the environment and coordinates the players, who only have partial views, by distributing individual strategies. Specifically, we 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players. We validate our methods on a resource collection task, a rescue game, and the StarCraft micromanagement tasks. We demonstrate zero-shot generalization to new team compositions. Our method achieves comparable or better performance than the setting where all players have a full view of the environment. Moreover, we see that the performance remains high even when the coach communicates as little as 13% of the time using the adaptive communication strategy.

[1]  Yuandong Tian,et al.  Multi-Agent Collaboration via Reward Attribution Decomposition , 2020, ArXiv.

[2]  Peter Stone,et al.  A Penny for Your Thoughts: The Value of Communication in Ad Hoc Teamwork , 2020, IJCAI.

[3]  Shimon Whiteson,et al.  AI-QMIX: Attention and Imagination for Dynamic Multi-Agent Reinforcement Learning , 2020, ArXiv.

[4]  Zihan Zhou,et al.  Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning , 2020, ICLR.

[5]  Shimon Whiteson,et al.  Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2020, J. Mach. Learn. Res..

[6]  Victor Lesser,et al.  ROMA: Multi-Agent Reinforcement Learning with Emergent Roles , 2020, ICML.

[7]  H. Zha,et al.  Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery , 2019, AAMAS.

[8]  Alessandro Lazaric,et al.  A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning , 2019, NeurIPS.

[9]  Shimon Whiteson,et al.  MAVEN: Multi-Agent Variational Exploration , 2019, NeurIPS.

[10]  Jianye Hao,et al.  From Few to More: Large-scale Dynamic Multiagent Curriculum Learning , 2019, AAAI.

[11]  O. Bousquet,et al.  Google Research Football: A Novel Reinforcement Learning Environment , 2019, AAAI.

[12]  Sumit Kumar,et al.  Learning Transferable Cooperative Behavior in Multi-Agent Teams , 2019, AAMAS.

[13]  Yung Yi,et al.  QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning , 2019, ICML.

[14]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[15]  Tamer Basar,et al.  A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning , 2019, IFAC-PapersOnLine.

[16]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[17]  Shimon Whiteson,et al.  Multi-Agent Common Knowledge Reinforcement Learning , 2018, NeurIPS.

[18]  Yuandong Tian,et al.  M^3RL: Mind-aware Multi-agent Management Reinforcement Learning , 2018, ICLR.

[19]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[20]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[21]  Joel Z. Leibo,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning , 2017, ArXiv.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[25]  P. Poupart,et al.  On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[26]  Tom Schaul,et al.  FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[27]  Joel Z. Leibo,et al.  Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[28]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[29]  Frans A. Oliehoek,et al.  A Concise Introduction to Decentralized POMDPs , 2016, SpringerBriefs in Intelligent Systems.

[30]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[33]  Sarit Kraus,et al.  Communicating with Unknown Teammates , 2014, ECAI.

[34]  Ali H. Sayed,et al.  Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[36]  Wenwu Yu,et al.  An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination , 2012, IEEE Transactions on Industrial Informatics.

[37]  Peter Stone,et al.  An analysis framework for ad hoc teamwork tasks , 2012, AAMAS.

[38]  Noa Agmon,et al.  Role-Based Ad Hoc Teamwork , 2011, AAAI.

[39]  Jongeun Choi,et al.  Distributed learning and cooperative control for multi-agent systems , 2009, Autom..

[40]  Jonghun Park,et al.  A Multiagent Approach to $Q$-Learning for Daily Stock Trading , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[41]  Jonathan Grizou Collaboration in Ad Hoc Teamwork : Ambiguous Tasks , Roles , and Communication , 2016 .

[42]  Ivilin Stoianov,et al.  Recurrent neural networks: design and applications , 1999 .