Worldwide Pose Estimation Using 3D Point Clouds

We address the problem of determining where a photo was taken by estimating a full 6-DOF-plus-intrincs camera pose with respect to a large geo-registered 3D point cloud, bringing together research on image localization, landmark recognition, and 3D pose estimation. Our method scales to datasets with hundreds of thousands of images and tens of millions of 3D points through the use of two new techniques: a co-occurrence prior for RANSAC and bidirectional matching of image features with 3D points. We evaluate our method on several large data sets, and show state-of-the-art results on landmark recognition as well as the ability to locate cameras to within meters, requiring only seconds per query.

[1]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[2]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[3]  Pascal Fua,et al.  Dynamic and scalable large scale image reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[7]  Michael Kroepfl,et al.  Efficiently locating photographs in many panoramas , 2010, GIS '10.

[8]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[9]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[10]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[13]  Bernd Girod,et al.  Outdoors augmented reality on mobile phone using loxel-based visual feature organization , 2008, MIR '08.

[14]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[15]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Alexei A. Efros,et al.  Estimating the Natural Illumination Conditions from a Single Outdoor Image , 2012, International Journal of Computer Vision.

[17]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Michael F. Cohen,et al.  Real-time image-based 6-DOF localization in large-scale environments , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Noah Snavely,et al.  Accurate Georegistration of Point Clouds Using Geographic Data , 2013, 2013 International Conference on 3D Vision.

[20]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Dani Lischinski,et al.  Deep photo: model-based photograph enhancement and viewing , 2008, SIGGRAPH 2008.

[22]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[23]  Horst Bischof,et al.  From structure-from-motion point clouds to fast location recognition , 2009, CVPR.

[24]  Tat-Seng Chua,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[25]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Zuzana Kukelova,et al.  A general solution to the P4P problem for camera with unknown focal length , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Frank Dellaert,et al.  GroupSAC: Efficient consensus in the presence of groupings , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[29]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  David W. Murray,et al.  Guided Sampling and Consensus for Motion Estimation , 2002, ECCV.

[32]  Andrew Owens,et al.  Discrete-continuous optimization for large-scale structure from motion , 2011, CVPR.

[33]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.