论文信息 - Attention-Based Population-Invariant Deep Reinforcement Learning for Collision-Free Flocking with A Scalable Fixed-Wing UAV Swarm

Attention-Based Population-Invariant Deep Reinforcement Learning for Collision-Free Flocking with A Scalable Fixed-Wing UAV Swarm

A swarm of fixed-wing unmanned aerial vehicles (UAVs) is expected to efficiently accomplish various tasks in complex scenarios. This paper proposes an attention-based population-invariant multi-agent deep reinforcement learning (MADRL) approach to deal with the decentralized collision-free flocking problem for a scalable fixed-wing UAV swarm. First, this problem is modeled as a decentralized partially observable Markov decision process from the perspective of each follower. Then, an improved multi-agent deep deterministic policy gradient (MADDPG) algorithm is presented to efficiently learn the population-invariant flocking policy. In this algorithm, the parameter sharing with ego-centric representation mechanism is incorporated to improve learning efficiency. Besides, the attention-based population-invariant network structure (APINet) is designed by leveraging the self-attention mechanism. With this structure, the learned flocking policy is invariant to the population of the swarm. Finally, both numerical and hardware-in-the-loop simulation results verify the efficiency and scalability of the proposed approach.

Lincheng Shen | K. H. Low | Xiaojia Xiang | Tianjiang Hu | Chao Yan

[1] Dimitra Panagou,et al. Multiagent Planning and Control for Swarm Herding in 2-D Obstacle Environments Under Bounded Inputs , 2021, IEEE Transactions on Robotics.

[2] Yuna Jiang,et al. Deep Reinforcement Learning of Collision-Free Flocking Policies for Multiple Fixed-Wing UAVs Using Local Situation Maps , 2021, IEEE Transactions on Industrial Informatics.

[3] Frederico G. Guimaraes,et al. Unmanned-Aerial-Vehicle Routing Problem With Mobile Charging Stations for Assisting Search and Rescue Missions in Postdisaster Scenarios , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[4] Ismael Bouassida Rodriguez,et al. Describing correct UAVs cooperation architectures applied on an anti-terrorism scenario , 2021, J. Inf. Secur. Appl..

[5] Chao Yan,et al. Flocking and Collision Avoidance for a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement Learning , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6] Chunhui Zhao,et al. Convergent Multiagent Formation Control With Collision Avoidance , 2020, IEEE Transactions on Robotics.

[7] Peng Yan,et al. Flocking Control of UAV Swarms with Deep Reinforcement Leaming Approach , 2020, 2020 3rd International Conference on Unmanned Systems (ICUS).

[8] Chang Wang,et al. Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach , 2020, Robotics Auton. Syst..

[9] Chao Yan,et al. Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments , 2019, Journal of Intelligent & Robotic Systems.

[10] Lincheng Shen,et al. Formation flight of fixed-wing UAV swarms: A group-based hierarchical approach , 2020 .

[11] Lincheng Shen,et al. Mission-Oriented Miniature Fixed-Wing UAV Swarms: A Multilayered and Distributed Architecture , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12] Chunhui Zhao,et al. Multivehicle Flocking With Collision Avoidance via Distributed Model Predictive Control , 2019, IEEE Transactions on Cybernetics.

[13] Chao Yan,et al. A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs , 2019, ACML.

[14] Hao Chen,et al. Coordinated Path-Following Control of Fixed-Wing Unmanned Aerial Vehicles , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[15] Isabelle Fantoni,et al. Distributed integral control of multiple UAVs: precise flocking and navigation , 2019, IET Control Theory & Applications.

[16] Zhen Xiao,et al. Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG , 2018, AAMAS.

[17] Chao Wang,et al. A DEEP REINFORCEMENT LEARNING APPROACH TO FLOCKING AND NAVIGATION OF UAVS IN LARGE-SCALE COMPLEX ENVIRONMENTS , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[18] Tor Arne Johansen,et al. Autonomous recovery of a fixed‐wing UAV using a net suspended by two multirotor UAVs , 2018, J. Field Robotics.

[19] Donald J. Bucci,et al. Distributed UAV Swarm Formation Control via Object-Focused, Multi-Objective SARSA , 2018, 2018 Annual American Control Conference (ACC).

[20] Xiangxiang Chu,et al. Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning , 2017, ArXiv.

[21] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[23] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[25] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[26] Dana H. Ballard,et al. Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.

[27] Deepan Lobo,et al. Implementation of Decentralized Reinforcement Learning-Based Multi-Quadrotor Flocking , 2021, IEEE Access.

[28] Sidney N. Givigi,et al. A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.