Revisiting the Bag-of-Visual-Words model: A hierarchical localization architecture for mobile systems

Abstract In this paper, an enhanced visual place recognition system is proposed aiming to improve the localization performance of a mobile platform. Our technique takes full advantage of the continuous input image stream in order to provide additional knowledge to the matching functionality. The well-established Bag-of-Visual-Words model is adapted into a hierarchical design that derives the visual information from the full entity of a natural scene into the description, while it additionally preserves the geometric structure of the explored world. Our approach is evaluated as part of a state-of-the-art Simultaneous-Localization-and-Mapping algorithm, and parallelization techniques are exploited utilizing every available hardware module in a low-power device. The implemented algorithm has been tested on several publicly available datasets offering consistently accurate localization results and preventing the majority of redundant computations that the additional geometrical verifications can induce.

[1]  Winston Churchill,et al.  The New College Vision and Laser Data Set , 2009, Int. J. Robotics Res..

[2]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[3]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[4]  Wolfram Burgard,et al.  3-D Mapping With an RGB-D Camera , 2014, IEEE Transactions on Robotics.

[5]  Hamid D. Taghirad,et al.  Loop Closure Detection by Algorithmic Information Theory: Implemented on Range and Camera Image Data , 2014, IEEE Transactions on Cybernetics.

[6]  Tony Lindeberg,et al.  Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention , 1993, International Journal of Computer Vision.

[7]  Teresa A. Vidal-Calleja,et al.  Action Selection for Single-Camera SLAM , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Sebastian Thrun,et al.  The Graph SLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures , 2006, Int. J. Robotics Res..

[9]  Stergios I. Roumeliotis,et al.  Power-SLAM: a linear-complexity, anytime algorithm for SLAM , 2011, Int. J. Robotics Res..

[10]  Evangelos E. Milios,et al.  Globally Consistent Range Scan Alignment for Environment Mapping , 1997, Auton. Robots.

[11]  Hojung Cha,et al.  Unsupervised Construction of an Indoor Floor Plan Using a Smartphone , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Wolfram Burgard,et al.  Robust vision-based localization by combining an image-retrieval system with Monte Carlo localization , 2005, IEEE Transactions on Robotics.

[13]  Stergios I. Roumeliotis,et al.  Multirobot Active Target Tracking With Combinations of Relative Observations , 2011, IEEE Transactions on Robotics.

[14]  Titus Cieslewski,et al.  Efficient Decentralized Visual Place Recognition Using a Distributed Inverted Index , 2017, IEEE Robotics and Automation Letters.

[15]  Dorian Gálvez-López,et al.  Bags of Binary Words for Fast Place Recognition in Image Sequences , 2012, IEEE Transactions on Robotics.

[16]  Jean-Arcady Meyer,et al.  Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words , 2008, IEEE Transactions on Robotics.

[17]  Ian D. Reid,et al.  Article in Press Robotics and Autonomous Systems ( ) – Robotics and Autonomous Systems a Comparison of Loop Closing Techniques in Monocular Slam , 2022 .

[18]  G. Giannakis,et al.  Kalman Filtering in Wireless Sensor Networks , 2010, IEEE Control Systems.

[19]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[20]  Paolo Valigi,et al.  Robust visual semi-semantic loop closure detection by a covisibility graph and CNN features , 2017, Robotics Auton. Syst..

[21]  J. M. M. Montiel,et al.  ORB-SLAM: A Versatile and Accurate Monocular SLAM System , 2015, IEEE Transactions on Robotics.

[22]  John J. Leonard,et al.  Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age , 2016, IEEE Transactions on Robotics.

[23]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[24]  George K. I. Mann,et al.  Appearance-Based Visual-Teach-And-Repeat Navigation Technique for Micro Aerial Vehicle , 2016, J. Intell. Robotic Syst..

[25]  Roland Siegwart,et al.  The EuRoC micro aerial vehicle datasets , 2016, Int. J. Robotics Res..

[26]  Petia Radeva,et al.  Toward Storytelling From Visual Lifelogging: An Overview , 2015, IEEE Transactions on Human-Machine Systems.

[27]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[28]  Yasir Latif,et al.  Robust loop closing over time for pose graph SLAM , 2013, Int. J. Robotics Res..

[29]  F. Michaud,et al.  Appearance-Based Loop Closure Detection for Online Large-Scale and Long-Term Operation , 2013, IEEE Transactions on Robotics.

[30]  Peter I. Corke,et al.  Visual Place Recognition: A Survey , 2016, IEEE Transactions on Robotics.

[31]  Vincent Lepetit,et al.  View-based Maps , 2010, Int. J. Robotics Res..

[32]  Francisco Angel Moreno,et al.  A collection of outdoor robotic datasets with centimeter-accuracy ground truth , 2009, Auton. Robots.

[33]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Michael Milford,et al.  Biologically-inspired visual place recognition with adaptive multiple scales , 2017, Robotics Auton. Syst..

[35]  Cyrill Stachniss,et al.  Lazy Data Association For Image Sequences Matching Under Substantial Appearance Changes , 2016, IEEE Robotics and Automation Letters.

[36]  Antonio Maria Rinaldi,et al.  Multimedia and geographic data integration for cultural heritage information retrieval , 2018, Multimedia Tools and Applications.

[37]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[38]  Wolfram Burgard,et al.  A Tutorial on Graph-Based SLAM , 2010, IEEE Intelligent Transportation Systems Magazine.

[39]  Timothy D. Barfoot,et al.  Visual teach and repeat for long-range rover autonomy , 2010 .

[40]  Antonios Gasteratos,et al.  Fast loop-closure detection using visual-word-vectors from image sequences , 2018, Int. J. Robotics Res..

[41]  Paul Newman,et al.  Appearance-only SLAM at large scale with FAB-MAP 2.0 , 2011, Int. J. Robotics Res..