Flocking and Collision Avoidance for a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement Learning

Developing the flocking behavior for a dynamic squad of fixed-wing UAVs is still a challenge due to kinematic complexity and environmental uncertainty. In this paper, we deal with the decentralized flocking and collision avoidance problem through deep reinforcement learning (DRL). Specifically, we formulate a decentralized DRL-based decision making framework from the perspective of every follower, where a collision avoidance mechanism is integrated into the flocking controller. Then, we propose a novel reinforcement learning algorithm PS-CACER for training a shared control policy for all the followers. Besides, we design a plug-n-play embedding module based on convolutional neural networks and the attention mechanism. As a result, the variable-length system state can be encoded into a fixed-length embedding vector, which makes the learned DRL policy independent with the number and the order of followers. Finally, numerical simulation results demonstrate the effectiveness of the proposed method, and the learned policies can be directly transferred to semi-physical simulation without any parameter finetuning.

[1]  Yi Wu,et al.  Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[2]  Alexandre Alahi,et al.  Crowd-Robot Interaction: Crowd-Aware Robot Navigation With Attention-Based Deep Reinforcement Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[3]  Donald J. Bucci,et al.  Distributed UAV Swarm Formation Control via Object-Focused, Multi-Objective SARSA , 2018, 2018 Annual American Control Conference (ACC).

[4]  Vijay Kumar,et al.  A Survey on Aerial Swarm Robotics , 2018, IEEE Transactions on Robotics.

[5]  Xiaojia Xiang,et al.  A Path Planning Algorithm for UAV Based on Improved Q-Learning , 2018, 2018 2nd International Conference on Robotics and Automation Sciences (ICRAS).

[6]  A Distributed Pipeline for Scalable, Deconflicted Formation Flying , 2020, IEEE Robotics and Automation Letters.

[7]  Chang Wang,et al.  Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach , 2020, Robotics Auton. Syst..

[8]  Toru Namerikawa,et al.  Formation control with collision avoidance for a multi-UAV system using decentralized MPC and consensus-based control , 2015, 2015 European Control Conference (ECC).

[9]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Chao Yan,et al.  A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs , 2019, ACML.

[12]  Isabelle Fantoni,et al.  Distributed integral control of multiple UAVs: precise flocking and navigation , 2019, IET Control Theory & Applications.

[13]  Yazhe Tang,et al.  Vision-Aided Multi-UAV Autonomous Flocking in GPS-Denied Environment , 2019, IEEE Transactions on Industrial Electronics.

[14]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Jonathan P. How,et al.  Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Shalabh Bhatnagar,et al.  Memory-Based Deep Reinforcement Learning for Obstacle Avoidance in UAV With Limited Environment Knowledge , 2018, IEEE Transactions on Intelligent Transportation Systems.

[18]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jianqiang Yi,et al.  Formation Control with Collision Avoidance through Deep Reinforcement Learning , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[20]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[21]  Chi Harold Liu,et al.  Distributed Energy-Efficient Multi-UAV Navigation for Long-Term Communication Coverage by Deep Reinforcement Learning , 2020, IEEE Transactions on Mobile Computing.

[22]  Chao Wang,et al.  A DEEP REINFORCEMENT LEARNING APPROACH TO FLOCKING AND NAVIGATION OF UAVS IN LARGE-SCALE COMPLEX ENVIRONMENTS , 2018, 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[23]  Chao Yan,et al.  Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments , 2019, Journal of Intelligent & Robotic Systems.

[24]  Carl-Johan Hoel,et al.  Automated Speed and Lane Change Decision Making using Deep Reinforcement Learning , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[25]  Daniel Oldziej,et al.  Flocking Algorithm for Fixed-Wing Unmanned Aerial Vehicles , 2015 .

[26]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[27]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.