Large-Scale Image Geolocalization

In this chapter, we explore the task of global image geolocalization—estimating where on the Earth a photograph was captured. We examine variants of the “im2gps” algorithm using millions of “geotagged” Internet photographs as training data. We first discuss a simple to understand nearest-neighbor baseline. Next, we introduce a lazy-learning approach with more sophisticated features that doubles the performance of the original “im2gps” algorithm. Beyond quantifying geolocalization accuracy, we also analyze (a) how the nonuniform distribution of training data impacts the algorithm (b) how performance compares to baselines such as random guessing and land-cover recognition and (c) whether geolocalization is simply landmark or “instance level” recognition at a large scale. We also show that geolocation estimates can provide the basis for image understanding tasks such as population density estimation or land cover estimation. This work was originally described, in part, in “im2gps” [9] which was the first attempt at global geolocalization using Internet-derived training data.

[1]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[2]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[3]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[6]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[7]  Marc Pollefeys,et al.  Large Scale Visual Geo-Localization of Images in Mountainous Terrain , 2012, ECCV.

[8]  Bernt Schiele,et al.  Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2007, International Journal of Computer Vision.

[9]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[10]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[11]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[13]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[14]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[18]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[20]  Alexei A. Efros,et al.  Where in the world? Human and computer geolocation of images , 2010 .

[21]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, CVPR.

[22]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[23]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[24]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[25]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Daniel C. Dennett Where Am I , 2016 .

[29]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[30]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[33]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[34]  M. Bar The proactive brain: using analogies and associations to generate predictions , 2007, Trends in Cognitive Sciences.

[35]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Alexei A. Efros,et al.  Photo clip art , 2007, SIGGRAPH 2007.

[37]  Tat-Seng Chua,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[38]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  William B. Thompson,et al.  Geometric reasoning under uncertainty for map-based localization , 1999, Spatial Cogn. Comput..