A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems

Abstract Learning to cooperate among agents has always been an important research topic in artificial intelligence. Multi-agent defense and attack, one of the important issues in multi-agent cooperation, requires multiple agents in the environment to learn effective strategies to achieve their goals. Deep reinforcement learning (DRL) algorithms have natural advantages dealing with continuous control problems especially under situations with dynamic interactions, and have provided new solutions for those long-studied multi-agent cooperation problems. In this paper, we start from deep deterministic policy gradient (DDPG) algorithm and then introduce multi-agent DDPG (MADDPG) to solve the multi-agent defense and attack problem under different situations. We reconstruct the considered environment, redefine the continuous state space, continuous action space, reward functions accordingly, and then apply deep reinforcement learning algorithms to obtain effective decision strategies. Several experiments considering different confrontation scenarios are conducted to validate the feasibility and effectiveness of the DRL-based methods. Experimental results show that through learning the agents can make better decisions, and learning with MADDPG achieves superior performance than learning with other DRL-based models, which also explains the importance and necessity of mastering other agents’ information.

[1]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[2]  Guy Lever,et al.  Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward , 2018, AAMAS.

[3]  Minjie Zhang,et al.  Multiagent Learning of Coordination in Loosely Coupled Multiagent Systems , 2015, IEEE Transactions on Cybernetics.

[4]  Guy Lever,et al.  Human-level performance in 3D multiplayer games with population-based reinforcement learning , 2018, Science.

[5]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[6]  Zhen Fan,et al.  A novel coordinated path planning method using k-degree smoothing for multi-UAVs , 2016, Appl. Soft Comput..

[7]  Ana L. C. Bazzan,et al.  A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems , 2019, Expert Syst. Appl..

[8]  Monireh Abdoos,et al.  Traffic light control in non-stationary environments based on multi agent Q-learning , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[9]  Parag C. Pendharkar,et al.  Trading financial indices with reinforcement learning agents , 2018, Expert Syst. Appl..

[10]  Dorian Kodelja,et al.  Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[11]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[12]  Raffaello D'Andrea,et al.  A decomposition approach to multi-vehicle cooperative control , 2005, Robotics Auton. Syst..

[13]  Yujing Hu,et al.  Multi-Agent Game Abstraction via Graph Attention Neural Network , 2019, AAAI.

[14]  Jonathan P. How,et al.  Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Gary Hewer,et al.  An Efficient Algorithm for Optimal Trajectory Generation for Heterogeneous Multi-Agent Systems in Non-Convex Environments , 2018, IEEE Robotics and Automation Letters.

[16]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[17]  Yujing Hu,et al.  From Few to More: Large-scale Dynamic Multiagent Curriculum Learning , 2020, AAAI.

[18]  R. D'Andrea,et al.  Modeling and control of a multi-agent system using mixed integer linear programming , 2002, Proceedings of the 41st IEEE Conference on Decision and Control, 2002..

[19]  Chaomin Luo,et al.  A Bio-Inspired Approach to Task Assignment of Swarm Robots in 3-D Dynamic Environments , 2017, IEEE Transactions on Cybernetics.

[20]  Jong-Hwan Kim,et al.  Modular Q-learning based multi-agent cooperation for robot soccer , 2001, Robotics Auton. Syst..

[21]  R. D'Andrea,et al.  The RoboFlag competition , 2003, Proceedings of the 2003 American Control Conference, 2003..

[22]  Kristina Lerman,et al.  Resource Allocation in the Grid with Learning Agents , 2005, Journal of Grid Computing.

[23]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  R. D'Andrea,et al.  A study in cooperative control: the RoboFlag drill , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[26]  Bijaya K. Panigrahi,et al.  A hybrid improved PSO-DV algorithm for multi-robot path planning in a clutter environment , 2016, Neurocomputing.

[27]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[28]  Yi Wu,et al.  Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.

[29]  Woojin Chung,et al.  Tripodal Schematic Control Architecture for Integration of Multi-Functional Indoor Service Robots , 2006, IEEE Transactions on Industrial Electronics.

[30]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[31]  Hong Qu,et al.  An improved genetic algorithm with co-evolutionary strategy for global path planning of multiple mobile robots , 2013, Neurocomputing.

[32]  Michael M. Zavlanos,et al.  Global Planning for Multi-Robot Communication Networks in Complex Environments , 2016, IEEE Transactions on Robotics.