Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach

In this paper, we propose a deep reinforcement learning (DRL)-based method that allows unmanned aerial vehicles (UAVs) to execute navigation tasks in large-scale complex environments. This technique is important for many applications such as goods delivery and remote surveillance. The problem is formulated as a partially observable Markov decision process (POMDP) and solved by a novel online DRL algorithm designed based on two strictly proved policy gradient theorems within the actor-critic framework. In contrast to conventional simultaneous localization and mapping-based or sensing and avoidance-based approaches, our method directly maps UAVs’ raw sensory measurements into control signals for navigation. Experiment results demonstrate that our method can enable UAVs to autonomously perform navigation in a virtual large-scale complex environment and can be generalized to more complex, larger-scale, and three-dimensional environments. Besides, the proposed online DRL algorithm addressing POMDPs outperforms the state-of-the-art.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Lydia Tapia,et al.  Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..

[3]  Imad Jawhar,et al.  UAVs for smart cities: Opportunities and challenges , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[4]  Fethi Belkhouche Modeling and Calculating the Collision Risk for Air Vehicles , 2013, IEEE Transactions on Vehicular Technology.

[5]  Zhaowei Zhong,et al.  Control, navigation and collision avoidance for an unmanned aerial vehicle , 2013 .

[6]  Rongbing Li,et al.  LIDAR/MEMS IMU integrated navigation (SLAM) method for a small UAV in indoor environments , 2014, 2014 DGON Inertial Sensors and Systems (ISS).

[7]  Hakan Temeltas,et al.  On the consistency analysis of A-SLAM for UAV navigation , 2014, Defense + Security Symposium.

[8]  Filip Jur C I Cek Natural Actor and Belief Critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs , 2010 .

[9]  Sam Devlin,et al.  Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.

[10]  Xiangxu Dong,et al.  Autonomous Navigation of UAV in Foliage Environment , 2016, J. Intell. Robotic Syst..

[11]  I-Ming Chen,et al.  Autonomous navigation of UAV by using real-time model-based reinforcement learning , 2016, 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[12]  Ossama Abdelkhalik,et al.  A Weighted Measurement Fusion Kalman Filter implementation for UAV navigation , 2013 .

[13]  Dmitry P. Nikolaev,et al.  UAV Navigation On The Basis Of The Feature Points Detection On Underlying Surface , 2015, ECMS.

[14]  Danping Zou,et al.  StructSLAM: Visual SLAM With Building Structure Lines , 2015, IEEE Transactions on Vehicular Technology.

[15]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[16]  Miguel A. Olivares-Méndez,et al.  Monocular Visual-Inertial SLAM-Based Collision Avoidance Strategy for Fail-Safe UAV Using Fuzzy Logic Controllers , 2014, J. Intell. Robotic Syst..

[17]  Xiao Zhang,et al.  Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[18]  Jyun-Min Dai,et al.  Path planning and obstacle avoidance for vision guided quadrotor UAV navigation , 2016, 2016 12th IEEE International Conference on Control and Automation (ICCA).

[19]  Long Lin,et al.  Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[20]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21]  Mandyam V. Srinivasan,et al.  Visual Odometry : Autonomous UAV Navigation using Optic Flow and Stereo , 2014, ICRA 2014.

[22]  Georgy Gimel'farb,et al.  Lidar guided stereo simultaneous localization and mapping (SLAM) for UAV outdoor 3-D scene reconstruction , 2016, 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[23]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[24]  Sergio Montenegro,et al.  Obstacle Detection and Collision Avoidance for a UAV With Complementary Low-Cost Sensors , 2015, IEEE Access.

[25]  Debasish Ghose,et al.  Inverse optical flow based guidance for UAV navigation through urban canyons , 2017 .

[26]  Sally I. McClean,et al.  UAV Position Estimation and Collision Avoidance Using the Extended Kalman Filter , 2013, IEEE Transactions on Vehicular Technology.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Dmitry P. Nikolaev,et al.  UAV Control on the Basis of 3D Landmark Bearing-Only Observations , 2015, Sensors.

[29]  Flavio Prieto,et al.  Visual based navigation for power line inspection by using virtual environments , 2015, Electronic Imaging.

[30]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[31]  Lindsay Kleeman,et al.  Robust Appearance Based Visual Route Following for Navigation in Large-scale Outdoor Environments , 2009, Int. J. Robotics Res..

[32]  Denis Gillet,et al.  Reciprocal collision avoidance for quadrotors using on-board visual detection , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  Jürgen Schmidhuber,et al.  Policy Gradient Critics , 2007, ECML.

[34]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[35]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[36]  Sebastian Thrun,et al.  The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures , 2006, Int. J. Robotics Res..

[37]  Jur P. van den Berg,et al.  Automatic collision avoidance for manually tele-operated unmanned aerial vehicles , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[39]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[40]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.