Reinforced Imitation: Sample Efficient Deep Reinforcement Learning for Mapless Navigation by Leveraging Prior Demonstrations

This letter presents a case study of a learning-based approach for target-driven mapless navigation. The underlying navigation model is an end-to-end neural network, which is trained using a combination of expert demonstrations, imitation learning (IL) and reinforcement learning (RL). While RL and IL suffer from a large sample complexity and the distribution mismatch problem, respectively, we show that leveraging prior expert demonstrations for pretraining can reduce the training time to reach at least the same level of the performance compared to plain RL by a factor of 5. We present a thorough evaluation of different combinations of expert demonstrations, different RL algorithms, and reward functions, both in simulation and on a real robotic platform. Our results show that the final model outperforms both standalone approaches in the amount of successful navigation tasks. In addition, the RL reward function can be significantly simplified when using pretraining, e.g., by using a sparse reward only. The learned navigation policy is able to generalize to unseen and real-world environments.

[1]  Wolfram Burgard,et al.  Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Michael Milford,et al.  One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay , 2017, ArXiv.

[3]  Nando de Freitas,et al.  Reinforcement and Imitation Learning for Diverse Visuomotor Skills , 2018, Robotics: Science and Systems.

[4]  Wolfram Burgard,et al.  Socially Compliant Navigation Through Raw Depth Inputs with Generative Adversarial Imitation Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Yann LeCun,et al.  Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[6]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[7]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Tsuhan Chen,et al.  Deep Neural Network for Real-Time Autonomous Indoor Navigation , 2015, ArXiv.

[9]  Ying Wang,et al.  A reinforcement learning based robotic navigation system , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[10]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[11]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[12]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[13]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[14]  Mykel J. Kochenderfer,et al.  Imitating driver behavior with generative adversarial networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[15]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[16]  Jonathan P. How,et al.  Socially aware motion planning with deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[18]  Richard Vaughan,et al.  Massively multi-robot simulation in stage , 2008, Swarm Intelligence.

[19]  Alois Knoll,et al.  Hierarchical Reinforcement Learning for Robot Navigation , 2013, ESANN.

[20]  Rajesh P. N. Rao,et al.  Learning Actions through Imitation and Exploration: Towards Humanoid Robots That Learn from Humans , 2009, Creating Brain-Like Intelligence.

[21]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[22]  Sebastian Thrun,et al.  Apprenticeship learning for motion planning with application to parking lot navigation , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[24]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[25]  Hannes Sommer,et al.  Predicting actions to act predictably: Cooperative partial motion planning with maximum entropy models , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Stefano Carpin,et al.  Combining imitation and reinforcement learning to fold deformable planar objects , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Martial Hebert,et al.  Learning monocular reactive UAV control in cluttered natural environments , 2012, 2013 IEEE International Conference on Robotics and Automation.

[29]  Roland Siegwart,et al.  From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[30]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[31]  Markus Wulfmeier,et al.  Watch this: Scalable cost-function learning for path planning in urban environments , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[32]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Wolfram Burgard,et al.  Socially compliant mobile robot navigation via inverse reinforcement learning , 2016, Int. J. Robotics Res..

[34]  Michael Milford,et al.  Multimodal deep autoencoders for control of a mobile robot , 2015, ICRA 2015.