Fast and Incremental Loop Closure Detection Using Proximity Graphs

Visual loop closure detection, which can be considered as an image retrieval task, is an important problem in SLAM (Simultaneous Localization and Mapping) systems. The frequently used bag-of-words (BoW) models can achieve high precision and moderate recall. However, the requirement for lower time costs and fewer memory costs for mobile robot applications is not well satisfied. In this paper, we propose a novel loop closure detection framework titled FILD’ (Fast and Incremental Loop closure Detection), which focuses on an on-line and incremental graph vocabulary construction for fast loop closure detection. The global and local features of frames are extracted using the Convolutional Neural Networks (CNN) and SURF on the GPU, which guarantee extremely fast extraction speeds. The graph vocabulary construction is based on one type of proximity graph, named Hierarchical Navigable Small World (HNSW) graphs, which is modified to adapt to this specific application. In addition, this process is coupled with a novel strategy for real-time geometrical verification, which only keeps binary hash codes and significantly saves on memory usage. Extensive experiments on several publicly available datasets show that the proposed approach can achieve fairly good recall at 100% precision compared to other state-of-the-art methods. The source code can be downloaded at https://github.comlAnshanTJU/FILD for further studies.

[1]  Shilin Zhou,et al.  BoCNF: efficient image matching with Bag of ConvNet features for scalable and robust visual place recognition , 2018, Auton. Robots.

[2]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[3]  Niko Sünderhauf,et al.  On the performance of ConvNet features for place recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[5]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[7]  Hong Zhang,et al.  An efficient index for visual search in appearance-based SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Francisco Angel Moreno,et al.  A collection of outdoor robotic datasets with centimeter-accuracy ground truth , 2009, Auton. Robots.

[9]  Jörg Stückler,et al.  Large-scale direct SLAM with stereo cameras , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[11]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Dirk Wollherr,et al.  IBuILD: Incremental bag of Binary words for appearance based loop closure detection , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[15]  Shilin Zhou,et al.  Convolutional neural network-based image representation for visual loop closure detection , 2015, 2015 IEEE International Conference on Information and Automation.

[16]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  John J. Leonard,et al.  Robust Mapping and Localization in Indoor Environments Using Sonar Data , 2002, Int. J. Robotics Res..

[19]  Antonios Gasteratos,et al.  Encoding the description of image sequences: A two-layered pipeline for loop closure detection , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Alexandr Andoni,et al.  Optimal Data-Dependent Hashing for Approximate Near Neighbors , 2015, STOC.

[21]  Rafael García,et al.  Automatic Visual Bag-of-Words for Online Robot Navigation and Mapping , 2012, IEEE Transactions on Robotics.

[22]  Hugh Durrant-Whyte,et al.  Simultaneous localization and mapping (SLAM): part II , 2006 .

[23]  Yang Liu,et al.  Indexing visual features: Real-time loop closure detection using a tree structure , 2012, 2012 IEEE International Conference on Robotics and Automation.

[24]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Hanqing Lu,et al.  Fast and Accurate Image Matching with Cascade Hashing for 3D Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[27]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[28]  Winston Churchill,et al.  The New College Vision and Laser Data Set , 2009, Int. J. Robotics Res..

[29]  Kurt Konolige,et al.  Incremental mapping of large cyclic environments , 1999, Proceedings 1999 IEEE International Symposium on Computational Intelligence in Robotics and Automation. CIRA'99 (Cat. No.99EX375).

[30]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[31]  Antonios Gasteratos,et al.  Assigning Visual Words to Places for Loop Closure Detection , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[33]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Michael Milford,et al.  Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free , 2015, Robotics: Science and Systems.

[35]  Hugh F. Durrant-Whyte,et al.  Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.

[36]  Roland Siegwart,et al.  Visual place recognition with probabilistic voting , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Josef Sivic Efficient visual search of images videos , 2006 .

[39]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[40]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.