Deep Reinforcement Learning Approach with Multiple Experience Pools for UAV’s Autonomous Motion Planning in Complex Unknown Environments

Autonomous motion planning (AMP) of unmanned aerial vehicles (UAVs) is aimed at enabling a UAV to safely fly to the target without human intervention. Recently, several emerging deep reinforcement learning (DRL) methods have been employed to address the AMP problem in some simplified environments, and these methods have yielded good results. This paper proposes a multiple experience pools (MEPs) framework leveraging human expert experiences for DRL to speed up the learning process. Based on the deep deterministic policy gradient (DDPG) algorithm, a MEP–DDPG algorithm was designed using model predictive control and simulated annealing to generate expert experiences. On applying this algorithm to a complex unknown simulation environment constructed based on the parameters of the real UAV, the training experiment results showed that the novel DRL algorithm resulted in a performance improvement exceeding 20% as compared with the state-of-the-art DDPG. The results of the experimental testing indicate that UAVs trained using MEP–DDPG can stably complete a variety of tasks in complex, unknown environments.

[1]  Xiao Su,et al.  A Somatosensory-Operation-Based Unmanned Aerial Vehicle for Power Line Inspection in First Person of View , 2018, 2018 International Conference on Power System Technology (POWERCON).

[2]  Wei Zhang,et al.  Coarse-to-Fine UAV Target Tracking With Deep Reinforcement Learning , 2019, IEEE Transactions on Automation Science and Engineering.

[3]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[4]  Yuan Shen,et al.  Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach , 2019, IEEE Transactions on Vehicular Technology.

[5]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6]  Peidong Liu,et al.  Autonomous navigation of UAV in forest , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  Zhu Wang,et al.  Enhanced sparse A* search for UAV path planning using dubins path estimation , 2014, Proceedings of the 33rd Chinese Control Conference.

[9]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[10]  Firooz A. Sadjadi,et al.  Small unmanned aerial vehicle (UAV) real-time intelligence, surveillance, and reconnaissance (ISR) using onboard pre-processing , 2008, SPIE Defense + Commercial Sensing.

[11]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  R. Bellman A Markovian Decision Process , 1957 .

[15]  Zhou Rui,et al.  Three-dimensional path planning of UAV based on an improved A* algorithm* , 2016, 2016 IEEE Chinese Guidance, Navigation and Control Conference (CGNCC).

[16]  Liuping Wang Model Predictive Control: Design and implementation using MATLAB (T-3) , 2009 .

[17]  Tor Ame Johansen,et al.  Deep Reinforcement Learning Attitude Control of Fixed-Wing UAVs Using Proximal Policy optimization , 2019, 2019 International Conference on Unmanned Aircraft Systems (ICUAS).

[18]  Xiao Han,et al.  Intelligent Decision-Making for 3-Dimensional Dynamic Obstacle Avoidance of UAV Based on Deep Reinforcement Learning , 2019, 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP).

[19]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[20]  Erik Cambria,et al.  A survey on deep reinforcement learning for audio-based applications , 2021, Artificial Intelligence Review.

[21]  Andreas Krause,et al.  Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for Mapless Navigation by Leveraging Prior Demonstrations , 2018, IEEE Robotics and Automation Letters.

[22]  Zhang Xiaoyi,et al.  Q learning algorithm based UAV path learning and obstacle avoidence approach , 2017, 2017 36th Chinese Control Conference (CCC).

[23]  Georgy Gimel'farb,et al.  Lidar guided stereo simultaneous localization and mapping (SLAM) for UAV outdoor 3-D scene reconstruction , 2016, 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[24]  Comparison of UAV Flight Behaviors After Autonomous Mapping of an Urban Area , 2018, 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE).

[25]  Yoon Seok Chang,et al.  A Novel Approach for Real Time Monitoring System to Manage UAV Delivery , 2016, 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI).

[26]  Shalabh Bhatnagar,et al.  Memory-Based Deep Reinforcement Learning for Obstacle Avoidance in UAV With Limited Environment Knowledge , 2018, IEEE Transactions on Intelligent Transportation Systems.

[27]  Cairo L. Nascimento,et al.  An experimental validation of reinforcement learning applied to the position control of UAVs , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[28]  Hriday Bavle,et al.  A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform , 2018, Journal of Intelligent & Robotic Systems.

[29]  Catur Aries Rokhmana,et al.  Utilizing UAV-based mapping in post disaster volcano eruption , 2016, 2016 6th International Annual Engineering Seminar (InAES).

[30]  Xiaoguang Gao,et al.  Robust Motion Control for UAV in Dynamic Uncertain Environments Using Deep Reinforcement Learning , 2020, Remote. Sens..

[31]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[32]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Xiaobo Lin,et al.  Supplementary Reinforcement Learning Controller Designed for Quadrotor UAVs , 2019, IEEE Access.

[34]  Halil Yetgin,et al.  Analysis and Optimization of Unmanned Aerial Vehicle Swarms in Logistics: An Intelligent Delivery Platform , 2019, IEEE Access.

[35]  Tong Heng Lee,et al.  A Comprehensive UAV Indoor Navigation System Based on Vision Optical Flow and Laser FastSLAM , 2013 .

[36]  Zhong Liu,et al.  Cooperative Routing Problem for Ground Vehicle and Unmanned Aerial Vehicle: The Application on Intelligence, Surveillance, and Reconnaissance Missions , 2019, IEEE Access.

[37]  Sebastian Thrun,et al.  The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures , 2006, Int. J. Robotics Res..

[38]  Hung Manh La,et al.  Reinforcement Learning for Autonomous UAV Navigation Using Function Approximation , 2018, 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[39]  Robert Jenssen,et al.  Intelligent Monitoring and Inspection of Power Line Components Powered by UAVs and Deep Learning , 2019, IEEE Power and Energy Technology Systems Journal.

[40]  Xin Yang,et al.  Fast Marine Route Planning for UAV Using Improved Sparse A* Algorithm , 2010, 2010 Fourth International Conference on Genetic and Evolutionary Computing.

[41]  Marc Hanheide,et al.  Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization , 2020, Robotics.

[42]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[43]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.