From Few to More: Large-scale Dynamic Multiagent Curriculum Learning

A lot of efforts have been devoted to investigating how agents can learn effectively and achieve coordination in multiagent systems. However, it is still challenging in large-scale multiagent settings due to the complex dynamics between the environment and agents and the explosion of state-action space. In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents. We propose three transfer mechanisms across curricula to accelerate the learning process. Moreover, due to the fact that the state dimension varies across curricula,, and existing network structures cannot be applied in such a transfer setting since their network input sizes are fixed. Therefore, we design a novel network structure called Dynamic Agent-number Network (DyAN) to handle the dynamic size of the network input. Experimental results show that DyMA-CL using DyAN greatly improves the performance of large-scale multiagent learning compared with state-of-the-art deep reinforcement learning approaches. We also investigate the influence of three transfer mechanisms across curricula through extensive simulations.

[1]  Peter Stone,et al.  Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[2]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[3]  Nando de Freitas,et al.  Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[4]  Peter Stone,et al.  Source Task Creation for Curriculum Learning , 2016, AAMAS.

[5]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[6]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[7]  Sumit Kumar,et al.  Learning Transferable Cooperative Behavior in Multi-Agent Teams , 2019, AAMAS.

[8]  Moshe Dor,et al.  אבן, and: Stone , 2017 .

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  Zongqing Lu,et al.  Learning Attentional Communication for Multi-Agent Cooperation , 2018, NeurIPS.

[11]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[12]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[13]  M. Stanković Multi-agent reinforcement learning , 2016 .

[14]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[15]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[18]  Yuandong Tian,et al.  Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning , 2016, ICLR.

[19]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[20]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[21]  Danica Kragic,et al.  VPE: Variational Policy Embedding for Transfer Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[22]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[23]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[24]  Weinan Zhang,et al.  MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence , 2017, AAAI.

[25]  Ying Wen,et al.  Factorized Q-learning for large-scale multi-agent systems , 2018, DAI.

[26]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[28]  Lantao Yu,et al.  A Study of AI Population Dynamics with Million-agent Reinforcement Learning , 2017, AAMAS.

[29]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[33]  Christopher Burgess,et al.  DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[34]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[35]  Peter Stone,et al.  Learning Curriculum Policies for Reinforcement Learning , 2018, AAMAS.

[36]  Craig Boutilier,et al.  The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.

[37]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.