Chapter 3 Large-Scale Image Geolocalization

In this chapter, we explore the task of global image geolocalization— estimating where on the Earth a photograph was captured. We examine variants of the “im2gps” algorithm using millions of “geotagged” Internet photographs as training data. We first discuss a simple to understand nearest-neighbor baseline. Next, we introduce a lazy-learning approach with more sophisticated features that doubles the performance of the original “im2gps” algorithm. Beyond quantifying geolocalization accuracy, we also analyze (a) how the nonuniform distribution of training data impacts the algorithm (b) how performance compares to baselines such as random guessing and land-cover recognition and (c) whether geolocalization is simply landmark or “instance level” recognition at a large scale. We also show that geolocation estimates can provide the basis for image understanding tasks such as population density estimation or land cover estimation. This work was originally described, in part, in “im2gps” [9] which was the first attempt at global geolocalization using Internet-derived training data.

[1]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[3]  Alexei A. Efros,et al.  Where in the world? Human and computer geolocation of images , 2010 .

[4]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[6]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[7]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Hartwig Adam,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[10]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[11]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[12]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[15]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[19]  M. Bar The proactive brain: using analogies and associations to generate predictions , 2007, Trends in Cognitive Sciences.

[20]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[22]  Bernt Schiele,et al.  Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2007, International Journal of Computer Vision.

[23]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[24]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[27]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[28]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[29]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[30]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[32]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[34]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[35]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[36]  William B. Thompson,et al.  Geometric Reasoning for Map-Based Localization , 1996 .

[37]  R B Godwin-Austen,et al.  Where am I? , 1982, British medical journal.