Geo-location inference from image content and user tags

Associating image content with their geographic locations has been increasingly pursued in the computer vision community in recent years. In a recent work, large collections of geotagged images were found to be helpful in estimating geo-locations of query images by simple visual nearest-neighbors search. In this paper, we leverage user tags along with image content to infer the geo-location. Our model builds upon the fact that the visual content and user tags of pictures can provide significant hints about their geo-locations. Using a large collection of over a million geotagged photographs, we build location probability maps of user tags over the entire globe. These maps reflect the picture-taking and tagging behaviors of thousands of users from all over the world, and reveal interesting tag map patterns. Visual content matching is performed using multiple feature descriptors including tiny images, color histograms, GIST features, and bags of textons. The combination of visual content matching and local tag probability maps forms a strong geo-inference engine. Large-scale experiments have shown significant improvements over pure visual content-based geo-location inference.

[1]  Dong Liu,et al.  LORE: An infrastructure to support location-aware services , 2004, IBM J. Res. Dev..

[2]  Ouri Wolfson,et al.  Extracting Semantic Location from Outdoor Positioning Systems , 2006, 7th International Conference on Mobile Data Management (MDM'06).

[3]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Yanxi Liu,et al.  Detecting and matching repeated patterns for automatic geo-tagging in urban environments , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[6]  Annika Hinze,et al.  Locations- and Time-Based Information Delivery in Tourism , 2003, SSTD.

[7]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[9]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[10]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[11]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Ravi Kumar,et al.  Visualizing tags over time , 2006, WWW '06.

[13]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[14]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[15]  R. Fergus,et al.  Tiny images , 2007 .

[16]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Jiebo Luo,et al.  Leveraging probabilistic season and location context models for scene understanding , 2008, CIVR '08.

[18]  Jiebo Luo,et al.  Pictures are not taken in a vacuum - an overview of exploiting context for semantic scene content understanding , 2006, IEEE Signal Processing Magazine.

[19]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Lior Wolf,et al.  Wide Baseline Matching between Unsynchronized Video Sequences , 2006, International Journal of Computer Vision.

[21]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[23]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[24]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jiebo Luo,et al.  Inferring generic activities and events from image content and bags of geo-tags , 2008, CIVR '08.

[26]  Thanasis Hadzilacos,et al.  Advances in Spatial and Temporal Databases , 2015, Lecture Notes in Computer Science.

[27]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[28]  Jiebo Luo,et al.  Selective hidden random fields: Exploiting domain-specific saliency for event classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.