Robust and accurate mobile visual localization and its applications

Mobile applications are becoming increasingly popular. More and more people are using their phones to enjoy ubiquitous location-based services (LBS). The increasing popularity of LBS creates a fundamental problem: mobile localization. Besides traditional localization methods that use GPS or wireless signals, using phone-captured images for localization has drawn significant interest from researchers. Photos contain more scene context information than the embedded sensors, leading to a more precise location description. With the goal being to accurately sense real geographic scene contexts, this article presents a novel approach to mobile visual localization according to a given image (typically associated with a rough GPS position). The proposed approach is capable of providing a complete set of more accurate parameters about the scene geo-context including the real locations of both the mobile user and perhaps more importantly the captured scene, as well as the viewing direction. To figure out how to make image localization quick and accurate, we investigate various techniques for large-scale image retrieval and 2D-to-3D matching. Specifically, we first generate scene clusters using joint geo-visual clustering, with each scene being represented by a reconstructed 3D model from a set of images. The 3D models are then indexed using a visual vocabulary tree structure. Taking geo-tags of the database image as prior knowledge, a novel location-based codebook weighting scheme proposed to embed this additional information into the codebook. The discriminative power of the codebook is enhanced, thus leading to better image retrieval performance. The query image is aligned with the models obtained from the image retrieval results, and eventually registered to a real-world map. We evaluate the effectiveness of our approach using several large-scale datasets and achieving estimation accuracy of a user's location within 13 meters, viewing direction within 12 degrees, and viewing distance within 26 meters. Of particular note is our showcase of three novel applications based on localization results: (1) an on-the-spot tour guide, (2) collaborative routing, and (3) a sight-seeing guide. The evaluations through user studies demonstrate that these applications are effective in facilitating the ideal rendezvous for mobile users.

[1]  Michael Kroepfl,et al.  Efficiently locating photographs in many panoramas , 2010, GIS '10.

[2]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, ECCV.

[3]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[4]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[5]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[6]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[7]  Tat-Seng Chua,et al.  ViewFocus: explore places of interests on Google maps using photos with view direction filtering , 2009, MM '09.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[11]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[12]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[14]  Klas Josephson,et al.  Pose estimation with radial distortion and unknown focal length , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  LiHouqiang,et al.  Robust and accurate mobile visual localization and its applications , 2013 .

[16]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[17]  Barry Smyth,et al.  The social camera: a case-study in contextual image recommendation , 2011, IUI '11.

[18]  Martin Byröd,et al.  Pose estimation with radial distortion and unknown focal length , 2009, CVPR.

[19]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[20]  Ming Yang,et al.  Query Specific Fusion for Image Retrieval , 2012, ECCV.

[21]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[22]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[23]  Jiebo Luo,et al.  Beyond GPS: determining the camera viewing direction of a geotagged image , 2010, ACM Multimedia.

[24]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[25]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[26]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[27]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[28]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[29]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jan-Michael Frahm,et al.  From structure-from-motion point clouds to fast location recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Rongrong Ji,et al.  Active query sensing for mobile location search , 2011, ACM Multimedia.

[32]  Tao Mei,et al.  When recommendation meets mobile: contextual and personalized recommendation on the go , 2011, UbiComp '11.

[33]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[34]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[35]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[36]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[37]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[38]  Newton Lee,et al.  ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMCCAP) , 2007, CIE.

[39]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[40]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[41]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[42]  Wen Gao,et al.  Towards low bit rate mobile visual search with multiple-channel coding , 2011, ACM Multimedia.

[43]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Horst Bischof,et al.  From structure-from-motion point clouds to fast location recognition , 2009, CVPR.