Fast image-based localization using direct 2D-to-3D matching

Recently developed Structure from Motion (SfM) reconstruction approaches enable the creation of large scale 3D models of urban scenes. These compact scene representations can then be used for accurate image-based localization, creating the need for localization approaches that are able to efficiently handle such large amounts of data. An important bottleneck is the computation of 2D-to-3D correspondences required for pose estimation. Current stateof- the-art approaches use indirect matching techniques to accelerate this search. In this paper we demonstrate that direct 2D-to-3D matching methods have a considerable potential for improving registration performance. We derive a direct matching framework based on visual vocabulary quantization and a prioritized correspondence search. Through extensive experiments, we show that our framework efficiently handles large datasets and outperforms current state-of-the-art methods.

[1]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[2]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Roberto Cipolla,et al.  An Image-Based System for Urban Navigation , 2004, BMVC.

[6]  Jiri Matas,et al.  Optimal Randomized RANSAC , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[8]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[9]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  James J. Little,et al.  Global localization using distinctive visual features , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Luc Van Gool,et al.  I know what you did last summer: object-level auto-annotation of holiday snaps , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[14]  Alvaro Collet,et al.  Making specific features less discriminative to improve point-based 3D object recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[16]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[17]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[18]  Klas Josephson,et al.  Pose estimation with radial distortion and unknown focal length , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[20]  Pascal Fua,et al.  Dynamic and scalable large scale image reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[23]  Tom Drummond,et al.  Scalable Monocular SLAM , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[25]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[26]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[28]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[29]  Dieter Schmalstieg,et al.  Wide area localization on mobile phones , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[30]  Paul Newman,et al.  FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance , 2008, Int. J. Robotics Res..

[31]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.