Morphing Strategy Design for UAV based on Prioritized Sweeping Reinforcement Learning

This paper proposes an improved deep deterministic policy gradient (DDPG) algorithm in the morphing policy designing for a kind of morphing unmanned aerial vehicles (UAVs) Considering that random selection in reinforcement learning structure is not always an efficient iterative update method, prioritized sweeping approach is introduced into the DDPG-based deep reinforcement learning framework, and the original DDPG algorithm is optimized to avoid random selection of state action pairs (SAPs). Consequently, the efficiency reduction problem in the traditional reinforcement learning structure is weakened. The proposed improved DDPG algorithm has better learning performance and can make reasonable decisions about environmental changes. A simulation experiment is carried out on the designed algorithm. By building a reinforcement learning model of the Markov decision process, the simulation results verify the effectiveness and superiority of the designed algorithm.

[1]  Murray Shanahan,et al.  Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Amir Hussain,et al.  Applications of Deep Learning and Reinforcement Learning to Biological Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Chunlin Chen,et al.  A novel DDPG method with prioritized experience replay , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[4]  Hussein A. Abbass,et al.  Hierarchical Deep Reinforcement Learning for Continuous Action Control , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Laurence T. Yang,et al.  A Double Deep Q-Learning Model for Energy-Efficient Edge Scheduling , 2019, IEEE Transactions on Services Computing.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Zibin Zheng,et al.  Online Deep Reinforcement Learning for Computation Offloading in Blockchain-Empowered Mobile Edge Computing , 2019, IEEE Transactions on Vehicular Technology.

[8]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[9]  Zhu Han,et al.  A Deep Reinforcement Learning Network for Traffic Light Cycle Control , 2018, IEEE Transactions on Vehicular Technology.

[10]  Li Li,et al.  Traffic signal timing via deep reinforcement learning , 2016, IEEE/CAA Journal of Automatica Sinica.

[11]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[12]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[13]  Antonio Liotta,et al.  On-Line Building Energy Optimization Using Deep Reinforcement Learning , 2017, IEEE Transactions on Smart Grid.

[14]  Chaoyang Dong,et al.  Disturbance rejection control of morphing aircraft based on switched nonlinear systems , 2019, Nonlinear Dynamics.

[15]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[16]  Nahum Shimkin,et al.  Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning , 2016, ICML.