Georeferencing Wikipedia Documents Using Data from Social Media Sources

Social media sources such as Flickr and Twitter continuously generate large amounts of textual information (tags on Flickr and short messages on Twitter). This textual information is increasingly linked to geographical coordinates, which makes it possible to learn how people refer to places by identifying correlations between the occurrence of terms and the locations of the corresponding social media objects. Recent work has focused on how this potentially rich source of geographic information can be used to estimate geographic coordinates for previously unseen Flickr photos or Twitter messages. In this article, we extend this work by analysing to what extent probabilistic language models trained on Flickr and Twitter can be used to assign coordinates to Wikipedia articles. Our results show that exploiting these language models substantially outperforms both (i) classical gazetteer-based methods (in particular, using Yahoo! Placemaker and Geonames) and (ii) language modelling approaches trained on Wikipedia alone. This supports the hypothesis that social media are important sources of geographic information, which are valuable beyond the scope of individual applications.

[1]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[2]  Steven Schockaert,et al.  Georeferencing Flickr photos using language models at different levels of granularity: An evidence based approach , 2012, J. Web Semant..

[3]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[4]  Philip David Smart,et al.  Multi-source Toponym Data Integration and Mediation for a Meta-Gazetteer Service , 2010, GIScience.

[5]  José Luis Borbinha,et al.  A geo-temporal Web gazetteer integrating data from multiple sources , 2008, 2008 Third International Conference on Digital Information Management.

[6]  Philip David Smart,et al.  Mining the web to detect place names , 2008, GIR '08.

[7]  Mor Naaman,et al.  Methods for extracting place semantics from Flickr tags , 2009, TWEB.

[8]  Geert-Jan Houben,et al.  WISTUD at MediaEval 2011: Placing Task , 2011, MediaEval.

[9]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[10]  Hideo Joho,et al.  Deliverable type: Contributing WP: , 2022 .

[11]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[12]  Steven Schockaert,et al.  Georeferencing Flickr resources based on textual meta-data , 2013, Inf. Sci..

[13]  Adrian Popescu,et al.  Gazetiki: automatic creation of a geographical gazetteer , 2008, JCDL '08.

[14]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[15]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[16]  Steven Schockaert,et al.  Finding locations of flickr resources using language models and similarity search , 2011, ICMR.

[17]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[18]  Bart Thomee,et al.  Working Notes for the Placing Task at MediaEval 2013 , 2013, MediaEval.

[19]  Jens Hartmann,et al.  Placing media items using the Xtrieval Framework , 2011, MediaEval.

[20]  Steven Schockaert,et al.  Georeferencing Wikipedia pages using language models from Flickr , 2011, ISWC 2011.

[21]  Adam Rae,et al.  Working Notes for the Placing Task at MediaEval 2011 , 2011, MediaEval.

[22]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[23]  Kilian Q. Weinberger,et al.  Resolving tag ambiguity , 2008, ACM Multimedia.

[24]  Claire Grover,et al.  Evaluation of georeferencing , 2010, GIR.

[25]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[27]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[28]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[29]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[30]  Avi Arampatzis,et al.  The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet , 2007, Int. J. Geogr. Inf. Sci..

[31]  Geert-Jan Houben,et al.  Placing images on the world map: a microblog-based enrichment approach , 2012, SIGIR '12.

[32]  LaereOlivier Van,et al.  Georeferencing Wikipedia Documents Using Data from Social Media Sources , 2014 .

[33]  Hanan Samet,et al.  Geotagging: using proximity, sibling, and prominence clues to understand comma groups , 2010, GIR.

[34]  Bernhard Seeger,et al.  Geographic Information Retrieval , 2004, WebDyn@WWW.