Vision-based Global Localization Using a Visual Vocabulary

This paper presents a novel coarse-to-fine global localization approach that is inspired by object recognition and text retrieval techniques. Harris-Laplace interest points characterized by SIFT descriptors are used as natural landmarks. These descriptors are indexed into two databases: an inverted index and a location database. The inverted index is built based on a visual vocabulary learned from the feature descriptors. In the location database, each location is directly represented by a set of scale invariant descriptors. The localization process consists of two stages: coarse localization and fine localization. Coarse localization from the inverted index is fast but not accurate enough; whereas localization from the location database using voting algorithm is relatively slow but more accurate. The combination of coarse and fine stages makes fast and reliable localization possible. In addition, if necessary, the localization result can be verified by epipolar geometry between the representative view in database and the view to be localized. Experimental results show that our approach is efficient and reliable.

[1]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  David W. Murray,et al.  Simultaneous Localization and Map-Building Using Active Vision , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Wolfram Burgard,et al.  Robust Monte Carlo localization for mobile robots , 2001, Artif. Intell..

[6]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[7]  Thierry Pun,et al.  Content-based query of image databases: inspirations from text retrieval , 2000, Pattern Recognit. Lett..

[8]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[9]  David G. Lowe,et al.  Scene modelling, recognition and tracking with invariant image features , 2004, Third IEEE and ACM International Symposium on Mixed and Augmented Reality.

[10]  Gregory Dudek,et al.  Learning environmental features for pose estimation , 2001, Image Vis. Comput..

[11]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jana Kosecka,et al.  Qualitative image based localization in indoors environments , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[15]  Jana Kosecka,et al.  Vision based topological Markov localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[16]  Alex Pentland,et al.  Visual contextual awareness in wearable computing , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[17]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19]  James J. Little,et al.  Mobile Robot Localization and Mapping with Uncertainty using Scale-Invariant Visual Landmarks , 2002, Int. J. Robotics Res..

[20]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Hongbin Zha,et al.  Image-based Localization and Pose Recovery Using Scale Invariant Features , 2004, 2004 IEEE International Conference on Robotics and Biomimetics.

[22]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Roberto Cipolla,et al.  A system for automatic pose-estimation from a single image in a city scene , 2002 .