Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus

Geoparsing and geocoding are two essential middleware services to facilitate final user applications such as location-aware searching or different types of location-based services. The objective of this work is to propose a method for establishing a processing chain to support the geoparsing and geocoding of text documents describing events strongly linked with space and with a frequent use of fine-grain toponyms. The geoparsing part is a Natural Language Processing approach which combines the use of part of speech and syntactico-semantic combined patterns (cascade of transducers). However, the real novelty of this work lies in the geocoding method. The geocoding algorithm is unsupervised and takes profit of clustering techniques to provide a solution for disambiguating the toponyms found in gazetteers, and at the same time estimating the spatial footprint of those other fine-grain toponyms not found in gazetteers. The feasibility of the proposal has been tested with a corpus of hiking descriptions in French, Spanish and Italian.

[1]  Kristina Lerman,et al.  Learning boundaries of vague places from noisy annotations , 2011, GIS.

[2]  Ludovic Moncla,et al.  Topographic Subtyping of Place Named Entities: a linguistic approach , 2013 .

[3]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[4]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[5]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[6]  Martin Raubal,et al.  Extracting Dynamic Urban Mobility Patterns from Mobile Phone Data , 2012, GIScience.

[7]  Changhu Wang,et al.  Equip tourists with knowledge mined from travelogues , 2010, WWW '10.

[8]  Xiao Zhang,et al.  Automatic Extraction of Destinations, Origins and Route Parts from Human Generated Route Directions , 2010, GIScience.

[9]  Paolo Rosso,et al.  A conceptual density‐based approach for the disambiguation of toponyms , 2008, Int. J. Geogr. Inf. Sci..

[10]  Xiao Zhang,et al.  Disambiguating Road Names in Text Route Descriptions using Exact-All-Hop Shortest Path Algorithm , 2012, ECAI.

[11]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[12]  Philip David Smart,et al.  Multi-source Toponym Data Integration and Mediation for a Meta-Gazetteer Service , 2010, GIScience.

[13]  Stephan Bohacek,et al.  Realistic mobility simulation of urban mesh networks , 2009, Ad Hoc Networks.

[14]  Laura Perret Extraction automatique d'information , 2005 .

[15]  S. Levinson,et al.  LANGUAGE AND SPACE , 1996 .

[16]  Jean-Paul Boons,et al.  La notion sémantique de déplacement dans une classification syntaxique des verbes locatifs , 1987 .

[17]  Monika Sester,et al.  Mining group movement patterns , 2013, SIGSPATIAL/GIS.

[18]  Joel H. Saltz,et al.  Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce , 2013, SIGSPATIAL/GIS.

[19]  Hanan Samet,et al.  Adaptive context features for toponym resolution in streaming news , 2012, SIGIR '12.

[20]  Jochen L. Leidner,et al.  Detecting geographical references in the form of place names and associated spatial natural language , 2011, SIGSPACIAL.

[21]  Javier Nogueras-Iso,et al.  Semantic selection of georeferencing services for urban management , 2010, J. Inf. Technol. Constr..

[22]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[23]  Davide Buscaldi,et al.  Grounding toponyms in an Italian local news corpus , 2010, GIR.

[24]  L. Talmy Toward a Cognitive Semantics , 2003 .

[25]  M. Egenhofer,et al.  Point-Set Topological Spatial Relations , 2001 .

[26]  Frédéric Béchet,et al.  Coopération de méthodes statistiques et symboliques pour l’adaptation non-supervisée d’un système d’étiquetage en entités nommées (Statistical and symbolic methods cooperation for the unsupervised adaptation of a named entity recognition system) , 2011, JEPTALNRECITAL.

[27]  Xing Xie,et al.  An efficient location extraction algorithm by leveraging web contextual information , 2010, GIS '10.

[28]  M. Aurnague,et al.  How motion verbs are spatial: The spatial foundations of intransitive motion verbs in French , 2011 .

[29]  Nina Wacholder,et al.  Disambiguation of Proper Names in Text , 1997, ANLP.

[30]  T. Poibeau Extraction automatique d'information : Du texte brut au web sémantique , 2003 .

[31]  Gideon S. Mann,et al.  Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[32]  Christian Sallaberry,et al.  Typage de noms toponymiques à des fins d'indexation géographique , 2012, Trait. Autom. des Langues.

[33]  Markus Breier The Way is the Goal – Modelling of Historical Roads , 2013 .

[34]  Sébastien Mustière,et al.  Automatic Itinerary Reconstruction from Texts , 2014, GIScience.

[35]  Ross Purves,et al.  Exploring place through user-generated content: Using Flickr tags to describe city cores , 2010, J. Spatial Inf. Sci..

[36]  Monika Sester,et al.  PARAMETER-FREE CLUSTER DETECTION IN SPATIAL DATABASES AND ITS APPLICATION TO TYPIFICATION , 2000 .

[37]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[38]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[39]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[40]  Ahmed Eldawy,et al.  CG_Hadoop: computational geometry in MapReduce , 2013, SIGSPATIAL/GIS.

[41]  Simon Scheider,et al.  Semantic place localization from narratives , 2013, COMP '13.

[42]  Ross Purves,et al.  From text to landscape: locating, identifying and mapping the use of landscape features in a Swiss Alpine corpus , 2014, Int. J. Geogr. Inf. Sci..

[43]  Paolo Rosso,et al.  Map-based vs. knowledge-based toponym disambiguation , 2008, GIR '08.

[44]  Jason Baldridge,et al.  Text-Driven Toponym Resolution using Indirect Supervision , 2013, ACL.

[45]  Fabio Ciravegna,et al.  Toponym Resolution in Social Media , 2010, SEMWEB.

[46]  Maurice van Keulen,et al.  Improving Toponym Disambiguation by Iteratively Enhancing Certainty of Extraction , 2012, KDIR.

[47]  Stefan M. Rüger,et al.  Using co‐occurrence models for placename disambiguation , 2008, Int. J. Geogr. Inf. Sci..

[48]  Jie Zhao,et al.  Exploiting location information for Web search , 2014, Comput. Hum. Behav..

[49]  Sanda M. Harabagiu,et al.  Toponym Disambiguation Using Events , 2010, FLAIRS Conference.

[50]  Mor Naaman,et al.  Methods for extracting place semantics from Flickr tags , 2009, TWEB.

[51]  Julien Lesbegueries,et al.  A global process to access documents' contents from a geographical point of view , 2008, J. Vis. Lang. Comput..

[52]  M. Gross The Construction of Local Grammars , 1997 .

[53]  Matthieu Constant,et al.  Grammaires locales pour l'analyse automatique de textes : méthodes de construction et outils de gestion. (Local grammars for text parsing: construction methods and management tools) , 2003 .

[54]  James G. Shanahan,et al.  Location disambiguation in local searches using gradient boosted decision trees , 2010, GIS '10.

[55]  L. Talmy Lexicalisation patterns: semantic structure in lexical forms , 1985 .

[56]  Andrée Borillo Quand les adverbiaux de localisation spatiale constituent des facteurs d’enchaînement spatio-temporel dans le discours , 2007 .

[57]  Denis Maurel,et al.  Finite-state transducer cascades to extract named entities in texts , 2004, Theor. Comput. Sci..

[58]  Andrew U. Frank,et al.  Combining Trip and Task Planning: How to Get from A to Passport , 2012, GIScience.

[59]  Davide Buscaldi,et al.  Approaches to disambiguating toponyms , 2011, SIGSPACIAL.

[60]  Andrée Borillo,et al.  A propos de la localisation spatiale , 1990 .