A multi-layer markup language for geospatial semantic annotations

In this paper we describe a markup language for semantically annotating raw texts. We define a formal representation of text documents written in natural language that can be applied for the task of Named Entities Recognition and Spatial Role Labeling. The proposal relies on a multi-layer annotation process based on a core generic layer, which can be freely adapted into more specific layers depending on the intended goal. Our markup language is based on the TEI Guidelines1 to propose a generic and extensible markup language. This language is particularly dedicated for the text mining task and ready to use to be layered with more semantic relationships between elements of the text. We show the feasibility of this proposal from a generic annotation of texts describing itineraries toward a geospatial semantic annotation.

[1]  James Pustejovsky,et al.  A Linguistically Grounded Annotation Language for Spatial Information , 2012, TAL.

[2]  Philippe Muller,et al.  A Qualitative Theory of Motion Based on Spatio-Temporal Primitives , 1998, KR.

[3]  M. Egenhofer,et al.  Point-Set Topological Spatial Relations , 2001 .

[4]  Beatrice Santorini Part-of-speech tagging guidelines for the penn treebank project , 1990 .

[5]  Kerstin Jonasson,et al.  Le nom propre , 1994 .

[6]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[7]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[8]  Beatrice Santorini,et al.  Part-of-Speech Tagging Guidelines for the Penn Treebank Project (3rd Revision) , 1990 .

[9]  Inderjeet Mani,et al.  SpatialML: Annotation Scheme, Corpora, and Tools , 2008, LREC.

[10]  Javier Nogueras-Iso,et al.  Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus , 2014, SIGSPATIAL/GIS.

[11]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[12]  Sébastien Mustière,et al.  Automatic Itinerary Reconstruction from Texts , 2014, GIScience.

[13]  Denis Maurel,et al.  Prolexbase. Un dictionnaire relationnel multilingue de noms propres [Prolexbase: a multilingual relational dictionary of Proper Names] , 2006, TAL.

[14]  Curdin Derungs,et al.  From Space to Place: Place-Based Explorations of Text , 2015, Int. J. Humanit. Arts Comput..

[15]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[16]  Andrew U. Frank,et al.  Qualitative Spatial Reasoning with Cardinal Directions , 1991, ÖGAI.