Vision-Based Fine-Grained Location Estimation

In this chapter, we explore a variety of vision-based location estimation techniques, in which the goal is to determine the location of an image at a fine-grained level. First, we introduce the concept about image-based location and landmark recognition (Sect. 4.1), which determines the location of a given image by leveraging collections of geo-located images. Early techniques usually treat this as a similar image matching problem and use the geo-tags transferred from the matched database images. Some recent works have examined how to estimate more fine-grained and comprehensive geo-context information, such as the viewing direction estimation (Sect. 4.3) of photos. Next we will review the techniques for city-scale location recognition, informative codebook generation, and geo-visual clustering (Sect. 4.4). Moreover, we will introduce the structure-from-motion technique, which is closely related to estimating the camera geo-location by generating 3D models. With the 3D scenes reconstructed from the image collections, images are localized by 2D–3D alignment (Sect. 4.5). The camera location, viewing direction, and scene location are estimated simultaneously, which are essential to various applications. Moreover, another class of vision-based location estimation technique using satellite-imagery database is also described (Sect. 4.6).

[1]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, ECCV.

[2]  Tomás Pajdla,et al.  Avoiding Confusing Features in Place Recognition , 2010, ECCV.

[3]  Martin Byröd,et al.  Pose estimation with radial distortion and unknown focal length , 2009, CVPR.

[4]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[5]  Daniel P. Huttenlocher,et al.  Location Recognition Using Prioritized Feature Matching , 2010, ECCV.

[6]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Barry Smyth,et al.  The social camera: a case-study in contextual image recommendation , 2011, IUI '11.

[8]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[9]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[10]  Wen Gao,et al.  Location Discriminative Vocabulary Coding for Mobile Landmark Search , 2011, International Journal of Computer Vision.

[11]  Yannis Avrithis,et al.  Retrieving landmark and non-landmark images from community photo collections , 2010, ACM Multimedia.

[12]  Mubarak Shah,et al.  Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[13]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[14]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[15]  Rongrong Ji,et al.  Active query sensing for mobile location search , 2011, ACM Multimedia.

[16]  Anas Al-Nuaimi,et al.  Mobile Visual Location Recognition , 2013 .

[17]  Xin Chen,et al.  City-scale landmark identification on mobile devices , 2011, CVPR 2011.

[18]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[19]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[23]  Samuel Kaski,et al.  Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets , 2016, IUI.

[24]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[25]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Torsten Sattler,et al.  Fast image-based localization using direct 2D-to-3D matching , 2011, 2011 International Conference on Computer Vision.

[27]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[28]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[29]  David Nistér,et al.  An efficient solution to the five-point relative pose problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[31]  Jiebo Luo,et al.  Beyond GPS: determining the camera viewing direction of a geotagged image , 2010, ACM Multimedia.

[32]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[33]  Tao Mei,et al.  Robust and accurate mobile visual localization and its applications , 2013, TOMCCAP.

[34]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[35]  Pascal Fua,et al.  Worldwide Pose Estimation Using 3D Point Clouds , 2012, ECCV.

[36]  Horst Bischof,et al.  From structure-from-motion point clouds to fast location recognition , 2009, CVPR.

[37]  Kristen Grauman,et al.  Clues from the beaten path: Location estimation with bursty sequences of tourist photos , 2011, CVPR 2011.

[38]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.