论文信息 - Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach

Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach

In this paper, we propose a deep reinforcement learning (DRL)-based method that allows unmanned aerial vehicles (UAVs) to execute navigation tasks in large-scale complex environments. This technique is important for many applications such as goods delivery and remote surveillance. The problem is formulated as a partially observable Markov decision process (POMDP) and solved by a novel online DRL algorithm designed based on two strictly proved policy gradient theorems within the actor-critic framework. In contrast to conventional simultaneous localization and mapping-based or sensing and avoidance-based approaches, our method directly maps UAVs’ raw sensory measurements into control signals for navigation. Experiment results demonstrate that our method can enable UAVs to autonomously perform navigation in a virtual large-scale complex environment and can be generalized to more complex, larger-scale, and three-dimensional environments. Besides, the proposed online DRL algorithm addressing POMDPs outperforms the state-of-the-art.

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Lydia Tapia,et al. Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..

[3] Imad Jawhar,et al. UAVs for smart cities: Opportunities and challenges , 2014, 2014 International Conference on Unmanned Aircraft Systems (ICUAS).

[4] Fethi Belkhouche. Modeling and Calculating the Collision Risk for Air Vehicles , 2013, IEEE Transactions on Vehicular Technology.

[5] Zhaowei Zhong,et al. Control, navigation and collision avoidance for an unmanned aerial vehicle , 2013 .

[6] Rongbing Li,et al. LIDAR/MEMS IMU integrated navigation (SLAM) method for a small UAV in indoor environments , 2014, 2014 DGON Inertial Sensors and Systems (ISS).

[7] Hakan Temeltas,et al. On the consistency analysis of A-SLAM for UAV navigation , 2014, Defense + Security Symposium.

[8] Filip Jur C I Cek. Natural Actor and Belief Critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs , 2010 .

[9] Sam Devlin,et al. Expressing Arbitrary Reward Functions as Potential-Based Advice , 2015, AAAI.

[10] Xiangxu Dong,et al. Autonomous Navigation of UAV in Foliage Environment , 2016, J. Intell. Robotic Syst..

[11] I-Ming Chen,et al. Autonomous navigation of UAV by using real-time model-based reinforcement learning , 2016, 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[12] Ossama Abdelkhalik,et al. A Weighted Measurement Fusion Kalman Filter implementation for UAV navigation , 2013 .

[13] Dmitry P. Nikolaev,et al. UAV Navigation On The Basis Of The Feature Points Detection On Underlying Surface , 2015, ECMS.

[14] Danping Zou,et al. StructSLAM: Visual SLAM With Building Structure Lines , 2015, IEEE Transactions on Vehicular Technology.

[15] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[16] Miguel A. Olivares-Méndez,et al. Monocular Visual-Inertial SLAM-Based Collision Avoidance Strategy for Fail-Safe UAV Using Fuzzy Logic Controllers , 2014, J. Intell. Robotic Syst..

[17] Xiao Zhang,et al. Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[18] Jyun-Min Dai,et al. Path planning and obstacle avoidance for vision guided quadrotor UAV navigation , 2016, 2016 12th IEEE International Conference on Control and Automation (ICCA).

[19] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[20] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21] Mandyam V. Srinivasan,et al. Visual Odometry : Autonomous UAV Navigation using Optic Flow and Stereo , 2014, ICRA 2014.

[22] Georgy Gimel'farb,et al. Lidar guided stereo simultaneous localization and mapping (SLAM) for UAV outdoor 3-D scene reconstruction , 2016, 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[23] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.

[24] Sergio Montenegro,et al. Obstacle Detection and Collision Avoidance for a UAV With Complementary Low-Cost Sensors , 2015, IEEE Access.

[25] Debasish Ghose,et al. Inverse optical flow based guidance for UAV navigation through urban canyons , 2017 .

[26] Sally I. McClean,et al. UAV Position Estimation and Collision Avoidance Using the Extended Kalman Filter , 2013, IEEE Transactions on Vehicular Technology.

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Dmitry P. Nikolaev,et al. UAV Control on the Basis of 3D Landmark Bearing-Only Observations , 2015, Sensors.

[29] Flavio Prieto,et al. Visual based navigation for power line inspection by using virtual environments , 2015, Electronic Imaging.

[30] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[31] Lindsay Kleeman,et al. Robust Appearance Based Visual Route Following for Navigation in Large-scale Outdoor Environments , 2009, Int. J. Robotics Res..

[32] Denis Gillet,et al. Reciprocal collision avoidance for quadrotors using on-board visual detection , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33] Jürgen Schmidhuber,et al. Policy Gradient Critics , 2007, ECML.

[34] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[35] Martial Hebert,et al. Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[36] Sebastian Thrun,et al. The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures , 2006, Int. J. Robotics Res..

[37] Jur P. van den Berg,et al. Automatic collision avoidance for manually tele-operated unmanned aerial vehicles , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[38] Jürgen Schmidhuber,et al. Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[39] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[40] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.