IM2GPS: estimating geographic information from a single image

Estimating geographic information from an image is an excellent, difficult high-level computer vision problem whose time has come. The emergence of vast amounts of geographically-calibrated image data is a great reason for computer vision to start looking globally - on the scale of the entire planet! In this paper, we propose a simple algorithm for estimating a distribution over geographic locations from a single image using a purely data-driven scene matching approach. For this task, we leverage a dataset of over 6 million GPS-tagged images from the Internet. We represent the estimated image location as a probability distribution over the Earthpsilas surface. We quantitatively evaluate our approach in several geolocation tasks and demonstrate encouraging performance (up to 30 times better than chance). We show that geolocation estimates can provide the basis for numerous other image understanding tasks such as population density estimation, land cover estimation or urban/rural classification.

[1]  R B Godwin-Austen,et al.  Where am I? , 1982, British medical journal.

[2]  William B. Thompson,et al.  Geometric Reasoning for Map-Based Localization , 1996 .

[3]  William B. Thompson,et al.  Geometric reasoning under uncertainty for map-based localization , 1999, Spatial Cogn. Comput..

[4]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[7]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[8]  Jitendra Malik,et al.  When is scene identification just texture recognition? , 2004, Vision Research.

[9]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[11]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[12]  Wei Zhang,et al.  Image Based Localization in Urban Environments , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[13]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[14]  R. Fergus,et al.  Tiny images , 2007 .

[15]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  M. Bar The proactive brain: using analogies and associations to generate predictions , 2007, Trends in Cognitive Sciences.

[18]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[21]  Jitendra Malik,et al.  When is scene recognition just texture recognition , 2010 .

[22]  Antonio Torralba,et al.  Object and scene recognition in tiny images , 2010 .