Loop closure detection using CNN words

Loop closure detection (LCD) is crucial for the simultaneous localization and mapping system of an autonomous robot. Image features from a convolution neural network (CNN) have been widely used for LCD in recent years. Instead of directly using the feature vectors to compute the image similarity, we propose a novel and easy-to-implement method that manages features from a CNN via a novel approach to improve the performance. In this method, the elements of feature maps from the higher layer of the CNN are clustered to generate CNN words (CNNW). To encode spatial information of CNNW, we create word pairs (CNNWP) that are based on single words to improve the performance. In addition, traditional tricks that are used in methods that are based on bag of words (BoW) are integrated into our approach. We also demonstrate that the feature maps from lower layers can be used as descriptors to conduct local region matching between images. Via this approach, we can perform geometric verification for possible loop closures, similar to BoW methods, in our approach. The experimental results demonstrate that our method substantially outperforms state-of-the-art methods that directly use CNN features for LCD.

[1]  Gordon Wyeth,et al.  SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights , 2012, 2012 IEEE International Conference on Robotics and Automation.

[2]  F. Michaud,et al.  Appearance-Based Loop Closure Detection for Online Large-Scale and Long-Term Operation , 2013, IEEE Transactions on Robotics.

[3]  Tao Zhang,et al.  Unsupervised learning to detect loops using deep neural networks for visual SLAM system , 2017, Auton. Robots.

[4]  Wolfram Burgard,et al.  Semantics-aware visual localization under challenging perceptual conditions , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Rafael García,et al.  Automatic Visual Bag-of-Words for Online Robot Navigation and Mapping , 2012, IEEE Transactions on Robotics.

[6]  Michael Milford,et al.  Deep learning features at scale for visual place recognition , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Gordon Wyeth,et al.  FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Shilin Zhou,et al.  Evaluation of Object Proposals and ConvNet Features for Landmark-based Visual Place Recognition , 2018, J. Intell. Robotic Syst..

[10]  Vincent Lepetit,et al.  View-based Maps , 2010, Int. J. Robotics Res..

[11]  Guang-Zhong Yang,et al.  Feature Co-occurrence Maps: Appearance-based localisation throughout the day , 2013, 2013 IEEE International Conference on Robotics and Automation.

[12]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[13]  Wolfram Burgard,et al.  Robust Visual Robot Localization Across Seasons Using Network Flows , 2014, AAAI.

[14]  Roland Siegwart,et al.  Efficient descriptor learning for large scale localization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hong Zhang,et al.  Combining Multiple Image Descriptions for Loop Closure Detection , 2018, J. Intell. Robotic Syst..

[17]  Brett Browning,et al.  Visual place recognition using HMM sequence matching , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[21]  Antonios Gasteratos,et al.  Encoding the description of image sequences: A two-layered pipeline for loop closure detection , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Zhenbao Liu,et al.  Place recognition based on deep feature and adaptive weighting of similarity matrix , 2016, Neurocomputing.

[23]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Francisco Angel Moreno,et al.  A collection of outdoor robotic datasets with centimeter-accuracy ground truth , 2009, Auton. Robots.

[25]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[26]  Luis Miguel Bergasa,et al.  Fusion and binarization of CNN features for robust topological localization across seasons , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Paolo Valigi,et al.  Robust visual semi-semantic loop closure detection by a covisibility graph and CNN features , 2017, Robotics Auton. Syst..

[28]  Antonios Gasteratos,et al.  Fast loop-closure detection using visual-word-vectors from image sequences , 2018, Int. J. Robotics Res..

[29]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[30]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[31]  Hong Zhang,et al.  Fast-SeqSLAM: A fast appearance based place recognition algorithm , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[34]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[35]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36]  Tomohiro Shibata,et al.  High performance loop closure detection using bag of word pairs , 2016, Robotics Auton. Syst..