Semantics-aware visual localization under challenging perceptual conditions

Visual place recognition under difficult perceptual conditions remains a challenging problem due to changing weather conditions, illumination and seasons. Long-term visual navigation approaches for robot localization should be robust to these dynamics of the environment. Existing methods typically leverage feature descriptions of whole images or image regions from Deep Convolutional Neural Networks. Some approaches also exploit sequential information to alleviate the problem of spatially inconsistent and non-perfect image matches. In this paper, we propose a novel approach for learning a discriminative holistic image representation which exploits the image content to create a dense and salient scene description. These salient descriptions are learnt over a variety of datasets under large perceptual changes. Such an approach enables us to precisely segment the regions of an image which are geometrically stable over large time lags. We combine features from these salient regions and an off-the-shelf holistic representation to form a more robust scene descriptor. We also introduce a semantically labeled dataset which captures extreme perceptual and structural scene dynamics over the course of 3 years. We evaluated our approach with extensive experiments on data collected over several kilometers in Freiburg and show that our learnt image representation outperforms off-the-shelf features from the deep networks and hand-crafted features.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Tomás Pajdla,et al.  Learning and Calibrating Per-Location Classifiers for Visual Place Recognition , 2013, International Journal of Computer Vision.

[3]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[4]  Andrew Zisserman,et al.  Visual Vocabulary with a Semantic Twist , 2014, ACCV.

[5]  Peer Neubert,et al.  Beyond Holistic Descriptors, Keypoints, and Fixed Patches: Multiscale Superpixel Grids for Place Recognition in Changing Environments , 2016, IEEE Robotics and Automation Letters.

[6]  Michael Bosse,et al.  The gist of maps - summarizing experience for lifelong localization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[8]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Wolfram Burgard,et al.  Robust Visual Robot Localization Across Seasons Using Network Flows , 2014, AAAI.

[13]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[14]  Wolfram Burgard,et al.  Efficient deep models for monocular road segmentation , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[15]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[16]  Wolfram Burgard,et al.  Robust visual SLAM across seasons , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[18]  Paul Newman,et al.  Made to measure: Bespoke landmarks for 24-hour, all-weather localisation with a camera , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Takeo Kanade,et al.  Real-time topometric localization , 2012, 2012 IEEE International Conference on Robotics and Automation.

[21]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Gordon Wyeth,et al.  Towards training-free appearance-based localization: Probabilistic models for whole-image descriptors , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Hua Wang,et al.  Robust Multimodal Sequence-Based Loop Closure Detection via Structured Sparsity , 2016, Robotics: Science and Systems.

[24]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jana Kosecka,et al.  Semantically guided location recognition for outdoors scenes , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Paul Newman,et al.  Learning place-dependant features for long-term vision-based localisation , 2015, Auton. Robots.

[27]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.