Inferring the geographic focus of online documents from social media sharing patterns

Determining the geographic focus of digital media is an essential first step for modern geographic information retrieval. However, publicly-visible location annotations are remarkably sparse in online data. In this work, we demonstrate a method which infers the geographic focus of an online document by examining the locations of Twitter users who share links to the document. We apply our geotagging technique to multiple datasets built from different content: manually-annotated news articles, GDELT, YouTube, Flickr, Twitter, and Tumblr.

[1]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[2]  Hiroyuki Kitagawa,et al.  Landmark-based user location inference in social media , 2013, COSN '13.

[3]  Judith Gelernter,et al.  Automatic gazetteer enrichment with user-geocoded data , 2013, GEOCROWD '13.

[4]  Mor Naaman,et al.  World explorer: visualizing aggregate data from unstructured text in geo-referenced collections , 2007, JCDL '07.

[5]  Jeffrey Nichols,et al.  Where Is This Tweet From? Inferring Home Locations of Twitter Users , 2012, ICWSM.

[6]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[7]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[8]  Steven Schockaert,et al.  Spatially Aware Term Selection for Geotagging , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jiejun Xu,et al.  Quantifying cross-platform engagement through large-scale user alignment , 2014, WebSci '14.

[10]  David Jurgens,et al.  That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships , 2013, ICWSM.

[11]  Andrea L. Bertozzi,et al.  Improving Density Estimation by Incorporating Spatial Information , 2010, EURASIP J. Adv. Signal Process..

[12]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[13]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[14]  Craig Lee,et al.  Detecting future social unrest in unprocessed Twitter data: “Emerging phenomena and big data” , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[15]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[16]  Mor Naaman,et al.  Methods for extracting place semantics from Flickr tags , 2009, TWEB.

[17]  Kevin S. McCurley,et al.  Geospatial mapping and navigation of the web , 2001, WWW '01.

[18]  Xavier Bresson,et al.  Multiclass Total Variation Clustering , 2013, NIPS.

[19]  Steven Schockaert,et al.  Georeferencing in Social Networks , 2013, Social Media Retrieval.

[20]  David Allen,et al.  Geotagging one hundred million Twitter accounts with total variation minimization , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[21]  Steven Schockaert,et al.  Finding locations of flickr resources using language models and similarity search , 2011, ICMR.

[22]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[23]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[24]  Alexander J. Smola,et al.  Hierarchical geographical modeling of user locations from social media posts , 2013, WWW.

[25]  Vanessa Murdock,et al.  Modeling locations with social media , 2013, Information Retrieval.

[26]  Jochen L. Leidner Toponym resolution in text , 2007 .

[27]  Aravind Srinivasan,et al.  'Beating the news' with EMBERS: forecasting civil unrest using open source indicators , 2014, KDD.

[28]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[29]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[30]  Jochen L. Leidner Toponym Resolution in Text: “Which Sheffield is it?” , 2004 .

[31]  Cun-Hui Zhang,et al.  The multivariate L1-median and associated data depth. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Fabio Ciravegna,et al.  Toponym Resolution in Social Media , 2010, SEMWEB.

[33]  Stefan M. Rüger,et al.  Using co‐occurrence models for placename disambiguation , 2008, Int. J. Geogr. Inf. Sci..