Visual Loop Closing using Gist Descriptors in Manhattan World

We present an approach for detecting loop clo- sures in a large sequence of omni-directional images of urban environments. In particular we investigate the efficacy of global gist descriptors computed for 360 o cylindrical panoramas and compare it with the baseline vocabulary tree approach. In the context of loop closure detection, we describe a novel matching strategy for panoramic views, exploiting the fact that the vehicle travels in urban environments where heading of the vehicle at previously visited locations and loop closure points are related by multiple of 90 o degrees. The performance of the presented approach is promising despite the simplicity of the descriptor. I. INTRODUCTION The problem of generating metric and/or topological maps from streams of visual data has became in recent years a very active area of research. This increased interest has been to a large extent facilitated by improvements in large scale wide- baseline matching techniques and advances in localization by means place recognition. The problem of localization by means of place recognition, for purely appearance based strategies is typically formulated as an image based retrieval task. Namely given a database of views from certain geo- graphical area, and set of new query views the goal was to determine the closest view from the reference database. The problem of loop closure detection we investigate here is different in its nature in that it explicitly takes into account temporal ordering constraints among the views as opposed to considering the database as unorganized collection of views. The loop closure problem requires determining for two images whether they have been taken from the same place. In principle the problem of loop closure detection can be tackled using the same strategies as those used in location recognition. Namely given n views of the video sequences, loops are hypothesized by comparing all views to all other views. The efficiency and scalability of the existing strategies depends on chosen image representation and the selected similarity measure. In this paper we investigate the suitability of the global gist descriptor as image representation and pro- posed a novel image panorama similarity measure between two views, which exploits the Manhattan world assumption stating that the vehicle heading at previously visited locations and current views are related by multiple of 90 o degrees. We will demonstrate that despite the simplicity and compactness of the global gist descriptor, its discriminability is quite high partly due to 360 o field of view.

[1]  José Santos-Victor,et al.  Vision-based navigation and environmental representations with an omnidirectional camera , 2000, IEEE Trans. Robotics Autom..

[2]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Michael Bosse,et al.  An Atlas framework for scalable mapping , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[4]  Roland Siegwart,et al.  Hybrid simultaneous localization and map building: a natural integration of topological and metric , 2003, Robotics Auton. Syst..

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Emanuele Menegatti,et al.  Image-based memory for robot navigation using properties of omnidirectional images , 2004, Robotics Auton. Syst..

[7]  Roland Siegwart,et al.  Incremental robot mapping with fingerprints of places , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Luc Van Gool,et al.  Omnidirectional Vision Based Topological Navigation , 2007, International Journal of Computer Vision.

[9]  Emanuele Menegatti,et al.  Bayesian inference in the space of topological maps , 2006, IEEE Transactions on Robotics.

[10]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[11]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Paul Newman,et al.  Detecting Loop Closure with Scene Sequences , 2007, International Journal of Computer Vision.

[13]  Friedrich Fraundorfer,et al.  Topological mapping, localization and navigation using image collections , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Achim J. Lilienthal,et al.  SIFT, SURF and Seasons: Long-term Outdoor Localization Using Local Features , 2007, EMCR.

[15]  Ben J. A. Kröse,et al.  From images to rooms , 2007, Robotics Auton. Syst..

[16]  Luc Van Gool,et al.  From omnidirectional images to hierarchical localization , 2007, Robotics Auton. Syst..

[17]  Tom Duckett,et al.  Incremental Spectral Clustering and Its Application To Topological Mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[18]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[19]  Jana Kosecka,et al.  Experiments in place recognition using gist panoramas , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[20]  Kostas Daniilidis,et al.  Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection , 2009, NIPS.