Modeling locations with social media

In this paper we focus on the locations explicit and implicit in users descriptions of their surroundings. We propose a statistical language modeling approach to identifying locations in arbitrary text, and investigate several ways to estimate the models, based on the term frequency and the user frequency. The geotagged public photos in Flickr serve as a convenient ground truth. Our results show that we can predict location within a one kilometer by one kilometer cell with 17 % accuracy, and within a three kilometer radius around such a one kilometer cell with 40 % accuracy, using only a photo’s tags. This is significantly better than the state of the art. Further we examine several estimation strategies that leverage the physical proximity of places, and show that for sparsely represented locations, smoothing from the immediate neighborhood improves results. We also show that estimation strategies based on user frequency are much more reliable than approaches based on the raw term frequency.

[1]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[2]  B. S. Manjunath,et al.  Spirittagger: a geo-aware tag suggestion tool mined from flickr , 2008, MIR '08.

[3]  Alan F. Smeaton,et al.  Context-Aware Person Identification in Personal Photo Collections , 2009, IEEE Transactions on Multimedia.

[4]  Mor Naaman,et al.  From Where to What: Metadata Sharing for Digital Photographs with Geographic Coordinates , 2003, OTM.

[5]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[6]  Oded Nov,et al.  Analysis of participation in an online photo-sharing community: A multidimensional perspective , 2010 .

[7]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[8]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[9]  Kentaro Toyama,et al.  Geographic location tags on digital images , 2003, ACM Multimedia.

[10]  Xin Li,et al.  Identifying regional sensitive queries in web search , 2008, WWW.

[11]  Andrea Esuli,et al.  CoPhIR: a Test Collection for Content-Based Image Retrieval , 2009, ArXiv.

[12]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[13]  James Allan,et al.  An Investigation of Dirichlet Prior Smoothing's Performance Advantage , 2005 .

[14]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[15]  Vanessa Murdock,et al.  Aspects of sentence retrieval , 2007, SIGF.

[16]  Mor Naaman,et al.  World explorer: visualizing aggregate data from unstructured text in geo-referenced collections , 2007, JCDL '07.

[17]  Oded Nov,et al.  Analysis of participation in an online photo-sharing community: A multidimensional perspective , 2010, J. Assoc. Inf. Sci. Technol..

[18]  Wei Vivian Zhang,et al.  Geographic intention and modification in web search , 2008, Int. J. Geogr. Inf. Sci..

[19]  T. Vincenty DIRECT AND INVERSE SOLUTIONS OF GEODESICS ON THE ELLIPSOID WITH APPLICATION OF NESTED EQUATIONS , 1975 .

[20]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[21]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[22]  Alan F. Smeaton,et al.  A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval , 2005, CIVR.

[23]  Nancy A. Van House,et al.  Flickr and public image-sharing: distant closeness and photo exhibition , 2007, CHI Extended Abstracts.

[24]  C. Lee Giles,et al.  Modeling and visualizing geo-sensitive queries based on user clicks , 2008, LocWeb.

[25]  Ross Purves,et al.  Exploring place through user-generated content: Using Flickr tags to describe city cores , 2010, J. Spatial Inf. Sci..

[26]  Ellen M. Voorhees,et al.  Report on the TREC-5 Confusion Track , 1996, TREC.

[27]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[28]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[29]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[30]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[31]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[33]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[34]  LONGBIN CHEN,et al.  Face Annotation for Family Photo Album Management , 2003, Int. J. Image Graph..

[35]  Thijs Westerveld,et al.  CWI at the TREC 2002 Video Track , 2002, TREC.

[36]  Marcel J. T. Reinders,et al.  Finding Wormholes with Flickr Geotags , 2010, ECIR.

[37]  Hideo Joho,et al.  Deliverable type: Contributing WP: , 2022 .

[38]  Hema Raghavan,et al.  Discovering users' specific geo intention in web search , 2009, WWW '09.

[39]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[40]  Chong Wang,et al.  Mining geographic knowledge using location aware topic model , 2007, GIR '07.