Multiagent Reinforcement Learning for Swarm Confrontation Environments

The swarm confrontation problem is always a hot research topic, which has attracted much attention. Previous research focuses on devising rules to improve the intelligence of the swarm, which is not suitable for complex scenarios. Multi-agent reinforcement learning has been used in some similar confrontation tasks. However, many of these works take centralized method to control all entities in a swarm, which is hard to meet the real-time requirement of practical systems. Recently, OpenAI proposes Multi-Agent Deep Deterministic Policy Gradient algorithm (MADDPG), which can be used for centralized training but decentralized execution in multi-agent environments. We examine the method in our constructed swarm confrontation environment and find that it is not easy to deal with complex scenarios. We propose two improved training methods, scenario-transfer training and self-play training, which greatly enhance the performance of MADDPG. Experimental results show that the scenario-transfer training accelerate the convergence speed by 50%, and the self-play training increases the winning rate of MADDPG from 42% to 96%.

[1]  Marios M. Polycarpou,et al.  Cooperative real-time search and task allocation in UAV teams , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[2]  Eva Besada-Portas,et al.  Evolutionary Trajectory Planner for Multiple UAVs in Realistic Scenarios , 2010, IEEE Transactions on Robotics.

[3]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Guillaume J. Laurent,et al.  Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems , 2012, The Knowledge Engineering Review.

[5]  David Silver,et al.  Deep Reinforcement Learning from Self-Play in Imperfect-Information Games , 2016, ArXiv.

[6]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[7]  Jun Wang,et al.  Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games , 2017, ArXiv.

[8]  Laura E. Barnes,et al.  Effective robot team control methodologies for battlefield applications , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Luitpold Babel,et al.  Coordinated Target Assignment and UAV Path Planning with Timing Constraints , 2018, J. Intell. Robotic Syst..

[10]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[11]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[12]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[13]  Michael R. Genesereth,et al.  General Game Playing: Overview of the AAAI Competition , 2005, AI Mag..

[14]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[15]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[16]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[17]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .