Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal

Model-free reinforcement learning has recently been shown to be effective at learning navigation policies from complex image input. However, these algorithms tend to require large amounts of interaction with the environment, which can be prohibitively costly to obtain on robots in the real world. We present an approach for efficiently learning goal-directed navigation policies on a mobile robot, from only a single coverage traversal of recorded data. The navigation agent learns an effective policy over a diverse action space in a large heterogeneous environment consisting of more than 2km of travel, through buildings and outdoor regions that collectively exhibit large variations in visual appearance, self-similarity, and connectivity. We compare pretrained visual encoders that enable precomputation of visual embeddings to achieve a throughput of tens of thousands of transitions per second at training time on a commodity desktop computer, allowing agents to learn from millions of trajectories of experience in a matter of hours. We propose multiple forms of computationally efficient stochastic augmentation to enable the learned policy to generalise beyond these precomputed embeddings, and demonstrate successful deployment of the learned policy on the real robot without fine tuning, despite environmental appearance differences at test time. The dataset and code required to reproduce these results and apply the technique to other datasets and robots is made publicly available at rl-navigation.github.io/deployable.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[3]  Hugh F. Durrant-Whyte,et al.  A solution to the simultaneous localization and map building (SLAM) problem , 2001, IEEE Trans. Robotics Autom..

[4]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[5]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  Jürgen Beyerer,et al.  Visual Servoing , 2012, Autom..

[8]  Marlos C. Machado,et al.  Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents , 2017, J. Artif. Intell. Res..

[9]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[10]  Philippe Gaussier,et al.  Robust Mapless Outdoor Vision-Based Navigation , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[12]  Timothy D. Barfoot,et al.  Visual teach and repeat for long-range rover autonomy , 2010 .

[13]  Gordon Wyeth,et al.  Efficient Goal Directed Navigation using RatSLAM , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[14]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Jian Zhang,et al.  Global Pose Estimation with an Attention-Based Recurrent Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Joel Z. Leibo,et al.  Unsupervised Predictive Memory in a Goal-Directed Agent , 2018, ArXiv.

[17]  Richard T. Vaughan,et al.  Use Your Illusion: Sensorimotor Self-simulation Allows Complex Agents to Plan with Incomplete Self-knowledge , 2006, SAB.

[18]  Michael Milford,et al.  One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay , 2017, ArXiv.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Jitendra Malik,et al.  Unifying Map and Landmark Based Representations for Visual Navigation , 2017, ArXiv.

[21]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[22]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  G. Uhlenbeck,et al.  On the Theory of the Brownian Motion , 1930 .

[26]  Wolfram Burgard,et al.  Neural SLAM , 2017, ArXiv.

[27]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[28]  Wolfram Burgard,et al.  Deep reinforcement learning with successor features for navigation across similar environments , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[30]  Ian D. Reid,et al.  Vast-scale Outdoor Navigation Using Adaptive Relative Bundle Adjustment , 2010, Int. J. Robotics Res..

[31]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[32]  Wolfram Burgard,et al.  Neural SLAM: Learning to Explore with External Memory , 2017, 1706.09520.

[33]  Peter I. Corke,et al.  Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[34]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[35]  Tim D. Barfoot,et al.  It's not easy seeing green: Lighting-resistant stereo Visual Teach & Repeat using color-constant images , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[37]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[39]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[40]  Ming Liu,et al.  Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).