Training a Convolutional Neural Network for Appearance-Invariant Place Recognition

Place recognition is one of the most challenging problems in computer vision, and has become a key part in mobile robotics and autonomous driving applications for performing loop closure in visual SLAM systems. Moreover, the difficulty of recognizing a revisited location increases with appearance changes caused, for instance, by weather or illumination variations, which hinders the long-term application of such algorithms in real environments. In this paper we present a convolutional neural network (CNN), trained for the first time with the purpose of recognizing revisited locations under severe appearance changes, which maps images to a low dimensional space where Euclidean distances represent place dissimilarity. In order for the network to learn the desired invariances, we train it with triplets of images selected from datasets which present a challenging variability in visual appearance. The triplets are selected in such way that two samples are from the same location and the third one is taken from a different place. We validate our system through extensive experimentation, where we demonstrate better performance than state-of-art algorithms in a number of popular datasets.

[1]  Peter I. Corke,et al.  All-environment visual place recognition with SMART , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[3]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Vincent Lepetit,et al.  Learning descriptors for object recognition and 3D pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael Milford,et al.  Convolutional Neural Network-based Place Recognition , 2014, ICRA 2014.

[6]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[7]  Niko Sünderhauf,et al.  Are We There Yet? Challenging SeqSLAM on a 3000 km Journey Across All Four Seasons , 2013 .

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Niko Sünderhauf,et al.  Appearance change prediction for long-term navigation across seasons , 2013, 2013 European Conference on Mobile Robots.

[12]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[14]  Zhanyi Hu,et al.  MSLD: A robust descriptor for line matching , 2009, Pattern Recognit..

[15]  Il Hong Suh,et al.  Outdoor place recognition in urban environments using straight lines , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[17]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[18]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[19]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[20]  Anil A. Bharath,et al.  Appearance-based indoor localization: A comparison of patch descriptor performance , 2015, Pattern Recognit. Lett..

[21]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[24]  Juan D. Tardós,et al.  Fast relocalisation and loop closing in keyframe-based SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Francisco Angel Moreno,et al.  The Málaga urban dataset: High-rate stereo and LiDAR in a realistic urban scenario , 2014, Int. J. Robotics Res..

[26]  Michael Milford,et al.  Vision-based place recognition: how low can you go? , 2013, Int. J. Robotics Res..

[27]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[28]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.