Attention-Based Population-Invariant Deep Reinforcement Learning for Collision-Free Flocking with A Scalable Fixed-Wing UAV Swarm

A swarm of fixed-wing unmanned aerial vehicles (UAVs) is expected to efficiently accomplish various tasks in complex scenarios. This paper proposes an attention-based population-invariant multi-agent deep reinforcement learning (MADRL) approach to deal with the decentralized collision-free flocking problem for a scalable fixed-wing UAV swarm. First, this problem is modeled as a decentralized partially observable Markov decision process from the perspective of each follower. Then, an improved multi-agent deep deterministic policy gradient (MADDPG) algorithm is presented to efficiently learn the population-invariant flocking policy. In this algorithm, the parameter sharing with ego-centric representation mechanism is incorporated to improve learning efficiency. Besides, the attention-based population-invariant network structure (APINet) is designed by leveraging the self-attention mechanism. With this structure, the learned flocking policy is invariant to the population of the swarm. Finally, both numerical and hardware-in-the-loop simulation results verify the efficiency and scalability of the proposed approach.

[1]  Dimitra Panagou,et al.  Multiagent Planning and Control for Swarm Herding in 2-D Obstacle Environments Under Bounded Inputs , 2021, IEEE Transactions on Robotics.

[2]  Yuna Jiang,et al.  Deep Reinforcement Learning of Collision-Free Flocking Policies for Multiple Fixed-Wing UAVs Using Local Situation Maps , 2021, IEEE Transactions on Industrial Informatics.

[3]  Frederico G. Guimaraes,et al.  Unmanned-Aerial-Vehicle Routing Problem With Mobile Charging Stations for Assisting Search and Rescue Missions in Postdisaster Scenarios , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[4]  Ismael Bouassida Rodriguez,et al.  Describing correct UAVs cooperation architectures applied on an anti-terrorism scenario , 2021, J. Inf. Secur. Appl..

[5]  Chao Yan,et al.  Flocking and Collision Avoidance for a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement Learning , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Chunhui Zhao,et al.  Convergent Multiagent Formation Control With Collision Avoidance , 2020, IEEE Transactions on Robotics.

[7]  Peng Yan,et al.  Flocking Control of UAV Swarms with Deep Reinforcement Leaming Approach , 2020, 2020 3rd International Conference on Unmanned Systems (ICUS).

[8]  Chang Wang,et al.  Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach , 2020, Robotics Auton. Syst..

[9]  Chao Yan,et al.  Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments , 2019, Journal of Intelligent & Robotic Systems.

[10]  Lincheng Shen,et al.  Formation flight of fixed-wing UAV swarms: A group-based hierarchical approach , 2020 .

[11]  Lincheng Shen,et al.  Mission-Oriented Miniature Fixed-Wing UAV Swarms: A Multilayered and Distributed Architecture , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[12]  Chunhui Zhao,et al.  Multivehicle Flocking With Collision Avoidance via Distributed Model Predictive Control , 2019, IEEE Transactions on Cybernetics.

[13]  Chao Yan,et al.  A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs , 2019, ACML.

[14]  Hao Chen,et al.  Coordinated Path-Following Control of Fixed-Wing Unmanned Aerial Vehicles , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[15]  Isabelle Fantoni,et al.  Distributed integral control of multiple UAVs: precise flocking and navigation , 2019, IET Control Theory & Applications.

[16]  Zhen Xiao,et al.  Modelling the Dynamic Joint Policy of Teammates with Attention Multi-agent DDPG , 2018, AAMAS.

[17]  Chao Wang,et al.  A DEEP REINFORCEMENT LEARNING APPROACH TO FLOCKING AND NAVIGATION OF UAVS IN LARGE-SCALE COMPLEX ENVIRONMENTS , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[18]  Tor Arne Johansen,et al.  Autonomous recovery of a fixed‐wing UAV using a net suspended by two multirotor UAVs , 2018, J. Field Robotics.

[19]  Donald J. Bucci,et al.  Distributed UAV Swarm Formation Control via Object-Focused, Multi-Objective SARSA , 2018, 2018 Annual American Control Conference (ACC).

[20]  Xiangxiang Chu,et al.  Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning , 2017, ArXiv.

[21]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[24]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[25]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[26]  Dana H. Ballard,et al.  Multiple-Goal Reinforcement Learning with Modular Sarsa(0) , 2003, IJCAI.

[27]  Deepan Lobo,et al.  Implementation of Decentralized Reinforcement Learning-Based Multi-Quadrotor Flocking , 2021, IEEE Access.

[28]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.