Deep learning features at scale for visual place recognition

The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. To enable this training to occur, we have developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of place appearance change at thousands of different places, as opposed to the semantic place type datasets currently available. This new dataset enables us to set up a training regime that interprets place recognition as a classification problem. We comprehensively evaluate our trained networks on several challenging benchmark place recognition datasets and demonstrate that they achieve an average 10% increase in performance over other place recognition algorithms and pre-trained CNNs. By analyzing the network responses and their differences from pre-trained networks, we provide insights into what a network learns when training for place recognition, and what these results signify for future research in this area.

[1]  Roberto Cipolla,et al.  PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Gordon Wyeth,et al.  FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day , 2010, 2010 IEEE International Conference on Robotics and Automation.

[3]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[5]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[6]  Michael Milford,et al.  Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[7]  Andrew Calway,et al.  Visual Place Recognition Using Landmark Distribution Descriptors , 2016, ACCV.

[8]  Achim J. Lilienthal,et al.  SIFT, SURF & seasons: Appearance-based long-term localization in outdoor environments , 2010, Robotics Auton. Syst..

[9]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Javier González,et al.  Training a Convolutional Neural Network for Appearance-Invariant Place Recognition , 2015, ArXiv.

[13]  Paul Newman,et al.  Illumination Invariant Imaging : Applications in Robust Vision-based Localisation , Mapping and Classification for Autonomous Vehicles , 2014 .

[14]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[15]  Niko Sünderhauf,et al.  Superpixel-based appearance change prediction for long-term navigation across seasons , 2014, Robotics Auton. Syst..

[16]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[18]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[19]  Paul Newman,et al.  Scene Signatures: Localised and Point-less Features for Localisation , 2014, Robotics: Science and Systems.

[20]  Michael Milford,et al.  Vision-based place recognition: how low can you go? , 2013, Int. J. Robotics Res..

[21]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[22]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  Gordon Wyeth,et al.  Transforming morning to afternoon using linear regression techniques , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[26]  Paul Newman,et al.  Highly scalable appearance-only SLAM - FAB-MAP 2.0 , 2009, Robotics: Science and Systems.

[27]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Niko Sünderhauf,et al.  Appearance change prediction for long-term navigation across seasons , 2013, 2013 European Conference on Mobile Robots.

[32]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[33]  Robert Pless,et al.  Consistent Temporal Variations in Many Outdoor Scenes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Wolfram Burgard,et al.  Robust Visual Robot Localization Across Seasons Using Network Flows , 2014, AAAI.