Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach

Abstract Fixed-Wing UAVs (Unmanned Aerial Vehicles) flocking is still a challenging problem due to the kinematics complexity and environmental dynamics. In this paper, we solve the leader–followers flocking problem using a novel deep reinforcement learning algorithm that can generate roll angle and velocity commands by training an end-to-end controller in continuous state and action spaces. Specifically, we choose CACLA (Continuous Actor–Critic Learning Automation) as the base algorithm and we use the multi-layer perceptron to represent both the actor and the critic. Besides, we further improve the learning efficiency by using the experience replay technique that stores the training data in the experience memory and samples from the memory as needed. We have compared the performance of the proposed CACER (Continuous Actor–Critic with Experience Replay) algorithm with benchmark algorithms such as DDPG and double DQN in numerical simulation, and we have demonstrated the performance of the learned optimal policy in semi-physical simulation without any parameter tuning.

[1]  Lincheng Shen,et al.  A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles , 2018, Robotics Auton. Syst..

[2]  Reza Olfati-Saber,et al.  Flocking for multi-agent dynamic systems: algorithms and theory , 2006, IEEE Transactions on Automatic Control.

[3]  Changbin Yu,et al.  ISS Method for Coordination Control of Nonlinear Dynamical Agents Under Directed Topology , 2014, IEEE Transactions on Cybernetics.

[4]  Tingting Sun,et al.  Flocking Control of Fixed-Wing UAVs With Cooperative Obstacle Avoidance Capability , 2019, IEEE Access.

[5]  Andreas Birk,et al.  Safety, Security, and Rescue Missions with an Unmanned Aerial Vehicle (UAV) , 2011, J. Intell. Robotic Syst..

[6]  Hung Manh La,et al.  Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage , 2018, ArXiv.

[7]  Nir Kshetri,et al.  The 2018 Winter Olympics: A Showcase of Technological Advancement , 2018, IT Professional.

[8]  Rachid Guerraoui,et al.  Learning to Gather without Communication , 2018, ArXiv.

[9]  Farzaneh Abdollahi,et al.  A Decentralized Cooperative Control Scheme With Obstacle Avoidance for a Team of Mobile Robots , 2014, IEEE Transactions on Industrial Electronics.

[10]  Shengyuan Xu,et al.  Adaptive finite-time flocking for uncertain nonlinear multi-agent systems with connectivity preservation , 2018, Neurocomputing.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Gheorghe Mogan,et al.  Neural networks based reinforcement learning for mobile robots obstacle avoidance , 2016, Expert Syst. Appl..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Hao Xu,et al.  A biologically-inspired distributed fault tolerant flocking control for multi-agent system in presence of uncertain dynamics and unknown disturbance , 2019, Eng. Appl. Artif. Intell..

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Reza Olfati-Saber,et al.  Consensus and Cooperation in Networked Multi-Agent Systems , 2007, Proceedings of the IEEE.

[18]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[19]  Weihua Sheng,et al.  Multirobot Cooperative Learning for Predator Avoidance , 2015, IEEE Transactions on Control Systems Technology.

[20]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[21]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1987, SIGGRAPH.

[22]  Richard M. Murray,et al.  Consensus problems in networks of agents with switching topology and time-delays , 2004, IEEE Transactions on Automatic Control.

[23]  Marco Wiering,et al.  Sampled Policy Gradient for Learning to Play the Game Agar.io , 2018, ArXiv.

[24]  Xiaohong Su,et al.  Online UAV path planning in uncertain and hostile environments , 2015, International Journal of Machine Learning and Cybernetics.

[25]  Chao Yan,et al.  Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments , 2019, Journal of Intelligent & Robotic Systems.

[26]  Farzaneh Abdollahi,et al.  A cyclic pursuit framework for networked mobile agents based on vector field approach , 2019, J. Frankl. Inst..