Image Based Localization in Urban Environments

In this paper we present a prototype system for image based localization in urban environments. Given a database of views of city street scenes tagged by GPS locations, the system computes the GPS location of a novel query view. We first use a wide-baseline matching technique based on SIFT features to select the closest views in the database. Often due to a large change of viewpoint and presence of repetitive structures, a large percentage of matches (> 50%) are not correct correspondences. The subsequent motion estimation between the query view and the reference view, is then handled by a novel and efficient robust estimation technique capable of dealing with large percentage of outliers. This stage is also accompanied by a model selection step among the fundamental matrix and the homography. Once the motion between the closest reference views is estimated, the location of the query view is then obtained by triangulation of translation directions. Approximate solutions for cases when triangulation cannot be obtained reliably are also described. The presented system is tested on the dataset used in ICCV 2005 Computer Vision Contest and is shown to have higher accuracy than previous reported results.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[4]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[5]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[6]  Luc Van Gool,et al.  HPAT Indexing for Fast Object/Scene Recognition Based on Local Appearance , 2003, CIVR.

[7]  Roberto Cipolla,et al.  An Image-Based System for Urban Navigation , 2004, BMVC.

[8]  T. Tuytelaars,et al.  Matching Widely Separated Views Based on Affine Invariant Regions , 2004, International Journal of Computer Vision.

[9]  Luc Van Gool,et al.  Fast wide baseline matching for visual navigation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[12]  Konrad Tollmar,et al.  Searching the Web with mobile images for location recognition , 2004, CVPR 2004.

[13]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wei Zhang,et al.  A New Inlier Identification Scheme for Robust Estimation Problems , 2006, Robotics: Science and Systems.