Real-Time Visual Place Recognition Based on Analyzing Distribution of Multi-scale CNN Landmarks

What makes visual place recognition difficult to solve is the variation of the real-world places. In this work, an effective similarity measurement is proposed for visual place recognition in changing environments, based on Convolutional Neural Networks (CNNs) and content-based multi-scale landmarks. The image is firstly segmented into multi-scale landmarks with content information in order to adapt variations of viewpoint, then highly representative features of landmarks are derived from Convolutional Neural Networks (CNNs), which are robust against appearance variations. In the similarity measurement, the similarity between images is determined by analyzing both spatial and scale distributions of matched landmarks. Moreover, an efficient feature extraction and reduction strategy are proposed to generate all features of landmarks at one time. The efficiency of the proposed method makes it suitable for real-time applications. The proposed method is evaluated on two widespread datasets with varied viewpoint and appearance conditions and achieves superior performance against four other state-of-the-art methods, such as the bag-of-words model DBoW3 and the CNN-based Edge Boxes landmarks. Extensive experimentation demonstrates that integrating global and local information can provide more invariance in severe appearance changes, and considering the spatial distribution of landmarks can improve the robustness against viewpoint changes.

[1]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[2]  Xue Yang,et al.  SRAL: Shared Representative Appearance Learning for Long-Term Visual Place Recognition , 2017, IEEE Robotics and Automation Letters.

[3]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[4]  Jana Kosecka,et al.  Experiments in place recognition using gist panoramas , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[5]  Ananth Ranganathan,et al.  Towards illumination invariance for visual localization , 2013, 2013 IEEE International Conference on Robotics and Automation.

[6]  Hongbin Zha,et al.  Combining interest points and edges for content-based image retrieval , 2005, IEEE International Conference on Image Processing 2005.

[7]  Jonathan M. Roberts,et al.  Robust outdoor visual localization using a three-dimensional-edge map , 2009 .

[8]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[9]  Xin Yang,et al.  Local Difference Binary for Ultrafast and Distinctive Feature Description , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Robert Pless,et al.  Consistent Temporal Variations in Many Outdoor Scenes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[12]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[13]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Javier González,et al.  Training a Convolutional Neural Network for Appearance-Invariant Place Recognition , 2015, ArXiv.

[15]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[16]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[17]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[18]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Achim J. Lilienthal,et al.  SIFT, SURF and Seasons: Long-term Outdoor Localization Using Local Features , 2007, EMCR.

[20]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[22]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[23]  Peer Neubert,et al.  Beyond Holistic Descriptors, Keypoints, and Fixed Patches: Multiscale Superpixel Grids for Place Recognition in Changing Environments , 2016, IEEE Robotics and Automation Letters.

[24]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[25]  Niko Sünderhauf,et al.  Superpixel-based appearance change prediction for long-term navigation across seasons , 2014, Robotics Auton. Syst..

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Gordon Wyeth,et al.  Transforming morning to afternoon using linear regression techniques , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[28]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[29]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[30]  Yang Liu,et al.  Visual loop closure detection with a compact image descriptor , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Luis Miguel Bergasa,et al.  Towards life-long visual localization using an efficient matching of binary sequences from images , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[33]  Gordon Wyeth,et al.  Robust outdoor visual localization using a three‐dimensional‐edge map , 2009, J. Field Robotics.

[34]  Achim J. Lilienthal,et al.  SIFT, SURF & seasons: Appearance-based long-term localization in outdoor environments , 2010, Robotics Auton. Syst..