Sample Efficient Reinforcement Learning for Navigation in Complex Environments

Navigation of mobile robots in unstructured, time-varying environments is challenging. It becomes even more complicated in disaster scenarios where logistical difficulties, as well as technical issues such as reactive and time-varying obstacles, exist. These scenarios are too complex for classical obstacle avoidance methods to navigate through successfully. This paper presents a sample efficient reinforcement learning algorithm for navigation in complex environments. The approach augments training data with randomly generated target location data to accelerate learning. A Q-learning approach is implemented, which is capable of quick training with limited episodes. The procedure is tested in four scenarios in Gazebo and one scenario in a real-world experiment. In the two simulation scenarios with no obstacles, the method can learn to navigate towards the target in fewer than 200 episodes. For environments with moving obstacles, training takes slightly longer, but the process is still able to learn an effective policy quickly.

[1]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[2]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Urs A. Muller,et al.  Learning long-range vision for autonomous off-road driving , 2009 .

[5]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Sebastian Thrun,et al.  Robotic mapping: a survey , 2003 .

[8]  Alexey Dosovitskiy,et al.  End-to-End Driving Via Conditional Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[10]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[11]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[12]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[15]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[16]  José Ruíz Ascencio,et al.  Visual simultaneous localization and mapping: a survey , 2012, Artificial Intelligence Review.

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Sebastian Thrun,et al.  The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures , 2006, Int. J. Robotics Res..

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  Rouhollah Rahmatizadeh,et al.  Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-to-End Learning from Demonstration , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Pieter Abbeel,et al.  Value Iteration Networks , 2016, NIPS.

[24]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[25]  Yuxi Li,et al.  Deep Reinforcement Learning: An Overview , 2017, ArXiv.

[26]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[27]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[28]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[29]  Gamini Dissanayake,et al.  A critique of current developments in simultaneous localization and mapping , 2016 .

[30]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[32]  Jürgen Schmidhuber,et al.  A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.

[33]  Rahul Sukthankar,et al.  Cognitive Mapping and Planning for Visual Navigation , 2017, International Journal of Computer Vision.

[34]  Juan D. Tardós,et al.  Hierarchical SLAM: real-time accurate mapping of large environments , 2005, IEEE Transactions on Robotics.

[35]  Sergey Levine,et al.  PLATO: Policy learning using adaptive trajectory optimization , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Samarth Brahmbhatt,et al.  DeepNav: Learning to Navigate Large Cities , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.