Inferring photographic location using geotagged web images

Geotagging has become a recent phenomenon that allows users to visualize and manage photo collections in many new and interesting ways. Unfortunately, manual geotagging of a large collection of pictures on the globe is still a time-consuming and laborious task even though geotagging devices are gradually being adopted. At the same time, there exist billions of legacy pictures taken before the onset of geotagging. In recent times, large collections of Web images have been found to facilitate a number of image understanding tasks including geolocation estimation. In this paper, we leverage user tags along with image content to infer the geolocation of images. Our model builds upon the fact that the visual content and user tags of pictures can together provide significant hints about their geolocations. Using a collection of over a million geotagged pictures, we build location probability maps for commonly used image tags over the entire globe. These maps reflect the collective picture-taking and tagging behaviors of thousands of users from all over the world. We further study the geographic entropy and frequency of user tags as geo-inference features and investigate the usefulness of using these features for selecting geographically meaningful annotations. On the other hand, visual content matching is performed using multiple feature descriptors including tiny images, color histograms, GIST features, and bags of textons. Finally, visual KNN matching based geographic mapping scheme is integrated with tag location probability maps to form a strong geo-inference engine. Experiments have shown improvements over geolocation inference performed using either modality alone.

[1]  Annika Hinze,et al.  Locations- and Time-Based Information Delivery in Tourism , 2003, SSTD.

[2]  Michael E. Lesk,et al.  New challenges in multimedia research for the increasingly connected and fast growing digital society , 2007, MIR '07.

[3]  Ravi Kumar,et al.  Visualizing tags over time , 2006, WWW '06.

[4]  Tat-Seng Chua,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, CVPR.

[5]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[6]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[8]  Jiebo Luo,et al.  Inferring generic activities and events from image content and bags of geo-tags , 2008, CIVR '08.

[9]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[10]  Richard Szeliski,et al.  City-Scale Location Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[12]  Jiebo Luo,et al.  Event recognition: viewing the world with a third eye , 2008, ACM Multimedia.

[13]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Jiebo Luo,et al.  Selective hidden random fields: Exploiting domain-specific saliency for event classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Dong Liu,et al.  LORE: An infrastructure to support location-aware services , 2004, IBM J. Res. Dev..

[17]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[18]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Ouri Wolfson,et al.  Extracting Semantic Location from Outdoor Positioning Systems , 2006, 7th International Conference on Mobile Data Management (MDM'06).

[20]  Jochen Schiller,et al.  Location Based Services , 2004 .

[21]  Jiebo Luo,et al.  Leveraging probabilistic season and location context models for scene understanding , 2008, CIVR '08.

[22]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[23]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jiebo Luo,et al.  Pictures are not taken in a vacuum - an overview of exploiting context for semantic scene content understanding , 2006, IEEE Signal Processing Magazine.

[25]  Jiebo Luo,et al.  Geo-location inference from image content and user tags , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[26]  Wei Zhang,et al.  Video Compass , 2002, ECCV.

[27]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[28]  R. Fergus,et al.  Tiny images , 2007 .

[29]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[31]  Lior Wolf,et al.  Wide Baseline Matching between Unsynchronized Video Sequences , 2006, International Journal of Computer Vision.

[32]  Robert Pless,et al.  Geolocating Static Cameras , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Yanxi Liu,et al.  Detecting and matching repeated patterns for automatic geo-tagging in urban environments , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.