Geotagging Text Content With Language Models and Feature Mining

The large-scale availability of user-generated content in social media platforms has recently opened up new possibilities for studying and understanding the geospatial aspects of real-world phenomena and events. Yet, the large majority of user-generated content lacks proper geographic information (in the form of latitude and longitude coordinates). As a result, the problem of multimedia geotagging, i.e., extracting location information from user-generated text items when this is not explicitly available, has attracted increasing research interest. Here, we present a highly accurate geotagging approach for estimating the locations alluded by text annotations based on refined language models that are learned from massive corpora of social media annotations. We further explore the impact of different feature selection and weighting techniques on the performance of the approach. In terms of evaluation, we employ a large benchmark collection from the MediaEval Placing Task over several years. We demonstrate the consistently superior geotagging accuracy and low median distance error of the proposed approach using various data sets and comparing it against a number of state-of-the-art systems.

[1]  Vanessa Murdock,et al.  Modeling locations with social media , 2013, Information Retrieval.

[2]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[3]  Daniel Ferrés,et al.  TALP-UPC at MediaEval 2014 Placing Task: Combining Geographical Knowledge Bases and Language Models for Large-Scale Textual Georeferencing , 2014, MediaEval.

[4]  Jurandy Almeida,et al.  Multimodal Image Geocoding: The 2013 RECOD's Approach , 2013, MediaEval.

[5]  Yiannis Kompatsiaris,et al.  CERTH/CEA LIST at MediaEval Placing Task 2015 , 2015, MediaEval.

[6]  Steven Schockaert,et al.  Finding locations of flickr resources using language models and similarity search , 2011, ICMR.

[7]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[8]  T. Sikora,et al.  A hierarchical, multi-modal approach for placing videos on the map using millions of Flickr photographs , 2011, SBNMA '11.

[9]  Yiannis Kompatsiaris,et al.  USEMP at MediaEval Placing Task 2014 , 2014, MediaEval.

[10]  Martha Larson,et al.  Exploration of Feature Combination in Geo-visual Ranking for Visual Content-based Location Prediction , 2013, MediaEval.

[11]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[12]  Alexei A. Efros,et al.  Large-Scale Image Geolocalization , 2015, Multimodal Location Estimation of Videos and Images.

[13]  Lars Schmidt-Thieme,et al.  Geo_ML @ MediaEval Placing Task 2015 , 2015, MediaEval.

[14]  Serge J. Belongie,et al.  Cross-View Image Geolocalization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Steven Schockaert,et al.  Spatially Aware Term Selection for Geotagging , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Steven Schockaert,et al.  Georeferencing Flickr photos using language models at different levels of granularity: An evidence based approach , 2012, J. Web Semant..

[17]  Ilya Kostrikov,et al.  PlaNet - Photo Geolocation with Convolutional Neural Networks , 2016, ECCV.

[18]  Krzysztof Janowicz,et al.  An agenda for the next generation gazetteer: geographic information contribution and retrieval , 2009, GIS.

[19]  Philip David Smart,et al.  Multi-source Toponym Data Integration and Mediation for a Meta-Gazetteer Service , 2010, GIScience.

[20]  Jurandy Almeida,et al.  RECOD @ Placing Task of MediaEval 2015 , 2015, MediaEval.

[21]  Jaeyoung Choi,et al.  The Placing Task at MediaEval 2015 , 2015, MediaEval.

[22]  Geert-Jan Houben,et al.  Geo-Location Estimation of Flickr Images: Social Web Based Enrichment , 2012, ECIR.

[23]  Davood Rafiei,et al.  Geotagging Flickr Photos And Videos Using Language Models , 2016, MediaEval.

[24]  Adrian Popescu CEA LIST's Participation at MediaEval 2013 Placing Task , 2013, MediaEval.

[25]  Jiebo Luo,et al.  Geotagging in multimedia and computer vision—a survey , 2010, Multimedia Tools and Applications.

[26]  Yiannis Kompatsiaris,et al.  In-depth Exploration of Geotagging Performance using Sampling Strategies on YFCC100M , 2016, MMCommons @ ACM Multimedia.

[27]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[28]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[29]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[30]  Martha Larson,et al.  Geo-Distinctive Visual Element Matching for Location Estimation of Images , 2016, IEEE Transactions on Multimedia.

[31]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Yiannis Kompatsiaris,et al.  Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG Features , 2016, MediaEval.

[33]  Charles F. F. Karney Algorithms for geodesics , 2011, Journal of Geodesy.

[34]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[35]  Yiannis Kompatsiaris,et al.  CERTH at MediaEval Placing Task 2013 , 2013, MediaEval.

[36]  Yiannis Kompatsiaris,et al.  SocialSensor at MediaEval Placing Task 2014 , 2014, MediaEval.

[37]  Zi Huang,et al.  Spatial-aware Multimodal Location Estimation for Social Images , 2015, ACM Multimedia.

[38]  Neha Jain,et al.  Identifying the Geographic Location of an Image with a Multimodal Probability Density Function , 2013, MediaEval.

[39]  Jurandy Almeida,et al.  RECOD @ Placing Task of MediaEval 2016: A Ranking Fusion Approach for Geographic-Location Prediction of Multimedia Objects , 2016, MediaEval.

[40]  Bart Thomee,et al.  Working Notes for the Placing Task at MediaEval 2013 , 2013, MediaEval.

[41]  Zi Huang,et al.  UQ-DKE's Participation at MediaEval 2014 Placing Task , 2014, MediaEval.

[42]  Steven Schockaert,et al.  Georeferencing Flickr resources based on textual meta-data , 2013, Inf. Sci..

[43]  Guillaume Gravier,et al.  Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach , 2013, ICMR.

[44]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[45]  Jaeyoung Choi,et al.  The 2014 ICSI/TU Delft Location Estimation System , 2014, MediaEval.

[46]  Hanan Samet,et al.  Geotagging: using proximity, sibling, and prominence clues to understand comma groups , 2010, GIR.

[47]  Jiewei Cao Photo Set Refinement and Tag Segmentation in Georeferencing Flickr Photos , 2013, MediaEval.

[48]  Sandeep Subramanian,et al.  VIT@MediaEval 2013 Placing Task : Location Specific Tag Weighting for Language Model Based Placing of Images , 2013, MediaEval.

[49]  STUART E. MIDDLETON,et al.  Geoparsing and Geosemantics for Social Media: Spatiotemporal Grounding of Content Propagating Rumors to Support Trust and Veracity Analysis during Breaking News , 2016, TOIS.

[50]  Jurandy Almeida,et al.  Multimedia Geocoding: The RECOD 2014 Approach , 2014, MediaEval.

[51]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[52]  Yiannis Kompatsiaris,et al.  Geotagging Social Media Content with a Refined Language Modelling Approach , 2015, PAISI.

[53]  Jaeyoung Choi,et al.  The Placing Task: A Large-Scale Geo-Estimation Challenge for Social-Media Videos and Images , 2014, GeoMM '14.

[54]  Tat-Seng Chua,et al.  Research and applications on georeferenced multimedia: a survey , 2010, Multimedia Tools and Applications.

[55]  Sebastian Schmiedeke,et al.  Imcube @ MediaEval 2015 Placing Task: Hierarchical Approach for Geo-referencing Large-Scale Datasets , 2015, MediaEval.