GSP (Geo-Semantic-Parsing): Geoparsing and Geotagging with Machine Learning on Top of Linked Data

Recently, user-generated content in social media opened up new alluring possibilities for understanding the geospatial aspects of many real-world phenomena. Yet, the vast majority of such content lacks explicit, structured geographic information. Here, we describe the design and implementation of a novel approach for associating geographic information to text documents. GSP exploits powerful machine learning algorithms on top of the rich, interconnected Linked Data in order to overcome limitations of previous state-of-the-art approaches. In detail, our technique performs semantic annotation to identify relevant tokens in the input document, traverses a sub-graph of Linked Data for extracting possible geographic information related to the identified tokens and optimizes its results by means of a Support Vector Machine classifier. We compare our results with those of 4 state-of-the-art techniques and baselines on ground-truth data from 2 evaluation datasets. Our GSP technique achieves excellent performances, with the best \(F1 = 0.91\), sensibly outperforming benchmark techniques that achieve \(F1 \le 0.78\).

[1]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[2]  Felice Dell'Orletta,et al.  T2K^2: a System for Automatically Extracting and Organizing Knowledge from Texts , 2014, LREC.

[3]  Davide Gazzè,et al.  Towards a DBpedia of Tourism: the case of Tourpedia , 2014, International Semantic Web Conference.

[4]  Thomas Gottron,et al.  Focused Exploration of Geospatial Context on Linked Open Data , 2014, IESD@ISWC.

[5]  Stuart E. Middleton,et al.  Real-Time Crisis Mapping of Natural Disasters Using Social Media , 2014, IEEE Intelligent Systems.

[6]  Deborah L. McGuinness,et al.  SameAs Networks and Beyond: Analyzing Deployment Status and Implications of owl: sameAs in Linked Data , 2010, International Semantic Web Conference.

[7]  Yiannis Kompatsiaris,et al.  Geotagging Text Content With Language Models and Feature Mining , 2017, Proceedings of the IEEE.

[8]  Mark Dredze,et al.  Geolocation for Twitter: Timing Matters , 2016, NAACL.

[9]  Michael J. Paul,et al.  Carmen: A Twitter Geolocation System with Applications to Public Health , 2013 .

[10]  Maurizio Tesconi,et al.  Impromptu Crisis Mapping to Prioritize Emergency Response , 2016, Computer.

[11]  Andrew Halterman,et al.  Mordecai: Full Text Geoparsing and Event Geocoding , 2017, J. Open Source Softw..

[12]  James Caverlee,et al.  Location prediction in social media based on tie strength , 2013, CIKM.

[13]  Michel Dumontier,et al.  Special issue on Linked Data for Health Care and the Life Sciences , 2014, Semantic Web.

[14]  Johannes Fürnkranz,et al.  Unsupervised generation of data mining features from linked open data , 2012, WIMS '12.

[15]  Salvatore Orlando,et al.  Dexter 2.0 - an Open Source Tool for Semantically Enriching Data , 2014, International Semantic Web Conference.

[16]  STUART E. MIDDLETON,et al.  Geoparsing and Geosemantics for Social Media: Spatiotemporal Grounding of Content Propagating Rumors to Support Trust and Veracity Analysis during Breaking News , 2016, TOIS.

[17]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[18]  Harald Sack,et al.  DBpedia ontology enrichment for inconsistency detection , 2012, I-SEMANTICS '12.

[19]  Rinke Hoekstra,et al.  Structural Properties as Proxy for Semantic Relevance in RDF Graph Sampling , 2014, SEMWEB.

[20]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[21]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[22]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[23]  Andrea Marchetti,et al.  Pulling Information from social media in the aftermath of unpredictable disasters , 2015, 2015 2nd International Conference on Information and Communication Technologies for Disaster Management (ICT-DM).

[24]  Andrea Marchetti,et al.  Predictability or Early Warning: Using Social Media in Modern Emergency Response , 2016, IEEE Internet Comput..

[25]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[26]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.