Online Deep Reinforcement Learning for Autonomous UAV Navigation and Exploration of Outdoor Environments

With the rapidly growing expansion in the use of UAVs, the ability to autonomously navigate in varying environments and weather conditions remains a highly desirable but as-of-yet unsolved challenge. In this work, we use Deep Reinforcement Learning to continuously improve the learning and understanding of a UAV agent while exploring a partially observable environment, which simulates the challenges faced in a real-life scenario. Our innovative approach uses a double state-input strategy that combines the acquired knowledge from the raw image and a map containing positional information. This positional data aids the network understanding of where the UAV has been and how far it is from the target position, while the feature map from the current scene highlights cluttered areas that are to be avoided. Our approach is extensively tested using variants of Deep Q-Network adapted to cope with double state input data. Further, we demonstrate that by altering the reward and the Q-value function, the agent is capable of consistently outperforming the adapted Deep Q-Network, Double Deep Q- Network and Deep Recurrent Q-Network. Our results demonstrate that our proposed Extended Double Deep Q-Network (EDDQN) approach is capable of navigating through multiple unseen environments and under severe weather conditions.

[1]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[3]  Angela P. Schoellig,et al.  Transfer learning for high‐precision trajectory tracking through L1 adaptive feedback and iterative learning , 2018, International Journal of Adaptive Control and Signal Processing.

[4]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[5]  Koren,et al.  Real-Time Obstacle Avoidance for Fast Mobile Robots , 2022 .

[6]  Yasmina Bestaoui Sebbane Intelligent Autonomy of UAVs: Advanced Missions and Future Use , 2018 .

[7]  Christoforos Kanellakis,et al.  Survey on Computer Vision for UAVs: Current Developments and Trends , 2017, Journal of Intelligent & Robotic Systems.

[8]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[9]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[10]  Yoram Koren,et al.  Real-time obstacle avoidance for fast mobile robots in cluttered environments , 1990, Proceedings., IEEE International Conference on Robotics and Automation.

[11]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[12]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[13]  Søren Rysgaard,et al.  Adapting open-source drone autopilots for real-time iceberg observations , 2018, MethodsX.

[14]  Yong-Guk Kim,et al.  Reward-driven U-Net training for obstacle avoidance drone , 2020, Expert Syst. Appl..

[15]  Tao Qin,et al.  Learning What Data to Learn , 2017, ArXiv.

[16]  Toby P. Breckon,et al.  Multi-Task Regression-Based Learning for Autonomous Unmanned Aerial Vehicle Flight Control Within Unstructured Outdoor Environments , 2019, IEEE Robotics and Automation Letters.

[17]  Yoram Koren,et al.  Real-time obstacle avoidance for fact mobile robots , 1989, IEEE Trans. Syst. Man Cybern..

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Insik Yoon,et al.  Transfer and Online Reinforcement Learning in STT-MRAM Based Embedded Systems for Autonomous Drones , 2019, 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[20]  Andrea Berton,et al.  Forestry applications of UAVs in Europe: a review , 2017 .

[21]  Toby P. Breckon,et al.  Extending Deep Neural Network Trail Navigation for Unmanned Aerial Vehicle Operation Within the Forest Canopy , 2018, TAROS.

[22]  D. Weinshall,et al.  Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[23]  Wei Meng,et al.  Autonomous Exploration and Mapping System Using Heterogeneous UAVs and UGVs in GPS-Denied Environments , 2019, IEEE Transactions on Vehicular Technology.

[24]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[25]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[26]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[28]  Derui Ding,et al.  Path Planning via an Improved DQN-Based Learning Policy , 2019, IEEE Access.

[29]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[30]  Carol J. Friedland,et al.  A SURVEY OF UNMANNED AERIAL VEHICLE ( UAV ) USAGE FOR IMAGERY , 2011 .

[31]  Changyin Sun,et al.  Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[32]  Carlos R. del-Blanco,et al.  DroNet: Learning to Fly by Driving , 2018, IEEE Robotics and Automation Letters.

[33]  Chao Yan,et al.  Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments , 2019, Journal of Intelligent & Robotic Systems.

[34]  Paul Newman,et al.  Work smart, not hard: Recalling relevant experiences for vast-scale but time-constrained localisation , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Marcello Chiaberge,et al.  Multipurpose UAV for search and rescue operations in mountain avalanche events , 2017 .

[36]  Li Sun,et al.  Learning Monocular Visual Odometry with Dense 3D Mapping from Dense 3D Flow , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Aleksandra Faust,et al.  Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots , 2019, ArXiv.

[38]  Barbara Webb,et al.  Visual Appearance Analysis of Forest Scenes for Monocular SLAM , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[39]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[40]  Yunus Karaca,et al.  The potential use of unmanned aircraft systems (drones) in mountain search and rescue operations , 2017, The American journal of emergency medicine.

[41]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[42]  Gang Wang,et al.  A Siamese Long Short-Term Memory Architecture for Human Re-identification , 2016, ECCV.