A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs

Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.

[1]  Bidyadhar Subudhi,et al.  Flocking Control of Multiple AUVs Based on Fuzzy Potential Functions , 2018, IEEE Transactions on Fuzzy Systems.

[2]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[3]  Weihua Sheng,et al.  Multirobot Cooperative Learning for Predator Avoidance , 2015, IEEE Transactions on Control Systems Technology.

[4]  Rui Wang,et al.  Multi-critic DDPG Method and Double Experience Replay , 2018, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  N. Matsui,et al.  Characteristics of Flocking Behavior Model by Reinforcement Learning Scheme , 2006, 2006 SICE-ICASE International Joint Conference.

[8]  Sidney N. Givigi,et al.  A Q-Learning Approach to Flocking With UAVs in a Stochastic Environment , 2017, IEEE Transactions on Cybernetics.

[9]  Marco Wiering,et al.  Using continuous action spaces to solve discrete problems , 2009, 2009 International Joint Conference on Neural Networks.

[10]  Chunlin Chen,et al.  A novel DDPG method with prioritized experience replay , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[11]  Aboelmagd Noureldin,et al.  A Dyna-Q (Lambda) Approach to Flocking with Fixed-Wing UAVs in a Stochastic Environment , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[14]  Yajing Wang,et al.  Vector Field Based Sliding Mode Control of Curved Path Following for Miniature Unmanned Aerial Vehicles in Winds , 2018, J. Syst. Sci. Complex..

[15]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[16]  Changbin Yu,et al.  ISS Method for Coordination Control of Nonlinear Dynamical Agents Under Directed Topology , 2014, IEEE Transactions on Cybernetics.

[17]  Lincheng Shen,et al.  A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles , 2018, Robotics Auton. Syst..

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.