HTRON: Efficient Outdoor Navigation with Sparse Rewards via Heavy Tailed Adaptive Reinforce Algorithm

We present a novel approach to improve the performance of deep reinforcement learning (DRL) based outdoor robot navigation systems. Most, existing DRL methods are based on carefully designed dense reward functions that learn the efficient behavior in an environment. We circumvent this issue by working only with sparse rewards (which are easy to design), and propose a novel adaptive Heavy-Tailed Reinforce algorithm for Outdoor Navigation called HTRON. Our main idea is to utilize heavy-tailed policy parametrizations which implicitly induce exploration in sparse reward settings. We evaluate the performance of HTRON against Reinforce, PPO and TRPO algorithms in three different outdoor scenarios: goal-reaching, obstacle avoidance, and uneven terrain navigation. We observe in average an increase of 34.41% in terms of success rate, a 15.15% decrease in the average time steps taken to reach the goal, and a 24.9% decrease in the elevation cost compared to the navigation policies obtained by the other methods. Further, we demonstrate that our algorithm can be transferred directly into a Clearpath Husky robot to perform outdoor terrain navigation in real-world scenarios.

[1]  D. Manocha,et al.  Dealing with Sparse Rewards in Continuous Control Robotics via Heavy-Tailed Policies , 2022, ArXiv.

[2]  Jing Liang,et al.  AdaptiveON: Adaptive Outdoor Navigation Method For Stable and Reliable Actions , 2022, ArXiv.

[3]  Alec Koppel,et al.  On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces , 2022, ICML.

[4]  Utsav Patel,et al.  TERP: Reliable Planning in Uneven Outdoor Environments using Deep Reinforcement Learning , 2022, ICRA.

[5]  Peter Stone,et al.  VOILA: Visual-Observation-Only Imitation Learning for Autonomous Navigation , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[6]  P. Stone,et al.  Motion planning and control for mobile robot navigation using machine learning: a survey , 2020, Autonomous Robots.

[7]  Jing Liang,et al.  TerraPN: Unstructured terrain navigation through Online Self-Supervised Learning , 2022, ArXiv.

[8]  Alec Koppel,et al.  On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control , 2021, ArXiv.

[9]  Andrew Thoesen,et al.  Planetary Surface Mobility and Exploration: A Review , 2021, Current Robotics Reports.

[10]  Bobak H. Baghi,et al.  Learning Goal Conditioned Socially Compliant Navigation From Demonstration Using Risk-Based Features , 2021, IEEE Robotics and Automation Letters.

[11]  Sivaraman Balakrishnan,et al.  On Proximal Policy Optimization's Heavy-tailed Gradients , 2021, ICML.

[12]  Chengmin Zhou,et al.  A review of motion planning algorithms for intelligent robots , 2021, Journal of Intelligent Manufacturing.

[13]  S. Levine,et al.  ViNG: Learning Open-World Navigation with Visual Goals , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Zongzhang Zhang,et al.  Cross-modal Domain Adaptation for Cost-Efficient Visual Reinforcement Learning , 2021, NeurIPS.

[15]  Boseong Kim,et al.  Assistive Delivery Robot Application for Real-World Postal Services , 2021, IEEE Access.

[16]  Tao Zhang,et al.  Deep reinforcement learning based mobile robot navigation: A review , 2021 .

[17]  Dinesh Manocha,et al.  GANav: Group-wise Attention Network for Classifying Navigable Regions in Unstructured Outdoor Environments , 2021, ArXiv.

[18]  Shirel Josef,et al.  Deep Reinforcement Learning for Safe Local Planning of a Ground Vehicle in Unknown Rough Terrain , 2020, IEEE Robotics and Automation Letters.

[19]  Qichao Zhang,et al.  Deep Reinforcement Learning-Based Automatic Exploration for Navigation in Unknown Environment , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Sonia Chernova,et al.  Recent Advances in Robot Learning from Demonstration , 2020, Annu. Rev. Control. Robotics Auton. Syst..

[21]  Jingjing Wang,et al.  Deep-Reinforcement-Learning-Based Autonomous UAV Navigation With Sparse Rewards , 2020, IEEE Internet of Things Journal.

[22]  Stefano Stramigioli,et al.  On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning and SLAM Based Approach , 2020, ArXiv.

[23]  Dinesh Manocha,et al.  DenseCAvoid: Real-time Navigation in Dense Crowds using Anticipatory Behaviors , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[24]  J. Hare Dealing with Sparse Rewards in Reinforcement Learning , 2019, ArXiv.

[25]  Anish Pandey,et al.  A review: On path planning strategies for navigation of mobile robot , 2019, Defence Technology.

[26]  Nan Jiang,et al.  Hierarchical automatic curriculum learning: Converting a sparse reward navigation task into dense reward , 2019, Neurocomputing.

[27]  Aleksandra Faust,et al.  Learning Navigation Behaviors End-to-End With AutoRL , 2018, IEEE Robotics and Automation Letters.

[28]  Goldie Nejat,et al.  Robot Navigation of Environments with Unknown Rough Terrain Using deep Reinforcement Learning , 2018, 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[29]  Jason J. Corso,et al.  A Critical Investigation of Deep Reinforcement Learning for Navigation , 2018, ArXiv.

[30]  Hao Zhang,et al.  Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Marcin Andrychowicz,et al.  Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Martin A. Riedmiller,et al.  Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards , 2017, ArXiv.

[33]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[34]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Mohamed Medhat Gaber,et al.  Deep reward shaping from demonstrations , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[36]  Jonathan P. How,et al.  Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Po-wei Chou The Beta Policy for Continuous Control Reinforcement Learning , 2017 .

[38]  Goldie Nejat,et al.  Multirobot Cooperative Learning for Semiautonomous Control in Urban Search and Rescue Applications , 2016, J. Field Robotics.

[39]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[40]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[41]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[42]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[43]  K. Murphy,et al.  Path planning for autonomous vehicles driving over rough terrain , 1998, Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell.

[44]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[45]  B. Faverjon,et al.  Probabilistic Roadmaps for Path Planning in High-Dimensional Con(cid:12)guration Spaces , 1996 .

[46]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.