Cultivating trees: adding several semantic layers to the Lassy treebank in SoNaR

Within the STEVIN1 project Large Scale Syntactic Annotation of written Dutch (LASSY), a manually corrected treebank of 1 million words is constructed. Lassy is part of a series of annotation projects for modern written and spoken Dutch. More specifically, it is an extension of the D-Coi and CGN projects,2 and constitutes the core of SoNaR, a 500 million words reference corpus of modern written Dutch.3 One of the goals of the latter project is to enrich the corrected treebank produced in Lassy4 with several semantic layers. For a general overview of the relations between D-Coi, Lassy and SoNaR, cf [19]. In this paper we will concentrate on the semantic layers of SoNaR core: (1) named entity labeling, (2) annotation of co-reference relations, (3) semantic role labeling and (4) annotation of spatial and temporal relations. Of these (2) originates from the STEVIN-project COREA,5 (3) and (4) from D-Coi, whereas (1) is a new area within STEVIN.

[1]  Ineke Schuurman,et al.  Spatiotemporal Annotation on Top of an Existing Treebank , 2007 .

[2]  Véronique Hoste,et al.  KNACK-2002: a Richly Annotated Corpus of Dutch Written Text , 2006, LREC.

[3]  Ineke Schuurman Spatiotemporal Annotation Using MiniSTEx: how to deal with Alternative, Foreign, Vague and/or Obsolete Names? , 2008, LREC.

[4]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[5]  Josef Ruppenhofer,et al.  FrameNet: Theory and Practice , 2003 .

[6]  Ineke Schuurman Which New York, which Monday? The role of background knowledge and intended audience in automatic disambiguation of spatiotemporal expressions , 2007, CLIN 2007.

[7]  Walter Daelemans,et al.  Memory-Based Named Entity Recognition using Unannotated Data , 2003, CoNLL.

[8]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[9]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Nelleke Oostdijk,et al.  From D-Coi to SoNaR: a reference corpus for Dutch , 2008, LREC.

[12]  Jochen L. Leidner Toponym resolution in text , 2007 .

[13]  Paola Monachesi,et al.  Adding Semantic Role Annotation to a Corpus of Written Dutch , 2007, LAW@ACL.

[14]  Antal van den Bosch,et al.  Integrating Seed Names and ngrams for a Named Entity List and Classifier , 2000, LREC.

[15]  Mitchell P. Marcus,et al.  Adding Semantic Annotation to the Penn TreeBank , 1998 .

[16]  Wendy G. Lehnert,et al.  A trainable approach to coreference resolution for information extraction , 1996 .

[17]  Walter Daelemans,et al.  Learning Dutch Coreference Resolution , 2005, CLIN.

[18]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[19]  Raphael Volz,et al.  Towards Ontology-based Disambiguation of Geographical Identifiers , 2007, I3.

[20]  Walter Daelemans,et al.  Coreference resolution for extracting answers for Dutch , 2008, LREC 2008.

[21]  O. Babko-malaya Guidelines for Propbank framers , 2005 .