From Visual Place Recognition to Navigation: Learning Sample-Efficient Control Policies across Diverse Real World Environments

Visual navigation tasks in real world environments often require both self-motion and place recognition feedback. While deep reinforcement learning has shown success in solving these perception and decision-making problems in an end-to-end manner, these algorithms require large amounts of experience to learn navigation policies from high-dimensional inputs, which is generally impractical for real robots due to sample complexity. In this paper, we address these problems with two main contributions. We first leverage place recognition and deep learning techniques combined with goal destination feedback to generate compact, bimodal images representations that can then be used to effectively learn control policies at kilometer scale from a small amount of experience. Second, we present an interactive and realistic framework, called CityLearn, that enables for the first time the training of navigation algorithms across city-sized, real-world environments with extreme environmental changes. CityLearn features over 10 benchmark real-world datasets often used in place recognition research with more than 100 recorded traversals and across 60 cities around the world. We evaluate our approach in two CityLearn environments where our navigation policy is trained using a single traversal. Results show our method can be over 2 orders of magnitude faster than when using raw images and can also generalize across extreme visual changes including day to night and summer to winter transitions.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[3]  Niko Sünderhauf,et al.  Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons , 2013 .

[4]  Illah R. Nourbakhsh,et al.  An Affective Mobile Robot Educator with a Full-Time Job , 1999, Artif. Intell..

[5]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[6]  Vijay Kumar,et al.  Memory Augmented Control Networks , 2017, ICLR.

[7]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[8]  Wolfram Burgard,et al.  Monte Carlo localization for mobile robots , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[9]  Michael Milford,et al.  LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics , 2018, Robotics: Science and Systems.

[10]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[11]  Gordon Wyeth,et al.  Mapping a Suburb With a Single Camera Using a Biologically Inspired SLAM System , 2008, IEEE Transactions on Robotics.

[12]  Hugh F. Durrant-Whyte,et al.  An Autonomous Guided Vehicle for Cargo Handling Applications , 1995, ISER.

[13]  Ali Farhadi,et al.  Target-driven visual navigation in indoor scenes using deep reinforcement learning , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[17]  Gordon Wyeth,et al.  FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day , 2010, 2010 IEEE International Conference on Robotics and Automation.

[18]  Ruslan Salakhutdinov,et al.  Active Neural Localization , 2018, ICLR.

[19]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[20]  Byoungkwon An,et al.  Looking Beyond the Visible Scene , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[22]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[23]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[24]  Inkyu Sa,et al.  Only look once, mining distinctive landmarks from ConvNet for visual place recognition , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[26]  Gordon Wyeth,et al.  RatSLAM: a hippocampal model for simultaneous localization and mapping , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[27]  Alberto Ferreira de Souza,et al.  Heading Direction Estimation Using Deep Learning with Automatic Large-scale Data Acquisition , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[28]  Klaus D. McDonald-Maier,et al.  Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions , 2019, ArXiv.

[29]  Michael Milford,et al.  Learning Deployable Navigation Policies at Kilometer Scale from a Single Traversal , 2018, CoRL.

[30]  Jason Weston,et al.  Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.

[31]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Volkan Cirik Following Formulaic Map Instructions in a Street Simulation Environment , 2018 .

[33]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[34]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[37]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Michael Warren,et al.  Unaided stereo vision based pose estimation , 2010, ICRA 2010.

[39]  Yang Gao,et al.  End-to-End Learning of Driving Models from Large-Scale Video Datasets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Paul Newman,et al.  1 year, 1000 km: The Oxford RobotCar dataset , 2017, Int. J. Robotics Res..

[41]  Michael Milford,et al.  Multi-Process Fusion: Visual Place Recognition Using Multiple Image Processing Methods , 2019, IEEE Robotics and Automation Letters.

[42]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[43]  Michael Milford,et al.  Vision-based place recognition: how low can you go? , 2013, Int. J. Robotics Res..

[44]  Michael Milford,et al.  Sequence searching with deep-learnt depth for condition- and viewpoint-invariant route-based place recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[46]  Wolfram Burgard,et al.  Robust Visual Localization Across Seasons , 2018, IEEE Transactions on Robotics.

[47]  Raia Hadsell,et al.  Learning to Navigate in Cities Without a Map , 2018, NeurIPS.

[48]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[49]  Samarth Brahmbhatt,et al.  DeepNav: Learning to Navigate Large Cities , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Sergey Levine,et al.  Learning modular neural network policies for multi-task and multi-robot transfer , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[51]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[52]  Yoav Artzi,et al.  TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.