Manual for the Annotation of in-document Referential Relations

This paper presents relevant information concerning our annotation of in-document coreference and of anaphora/cataphora. It provides a definition of the textual and semantic relation types and of the category system used for the annotation together with a description of potential markables in the framework of coreference and anaphora/cataphora resolution. It also describes the data base containing the annotated texts and gives an illustration of the annotation tools used for our task. The overall aim is to provide comprehensive and comprehensible guidelines for both users of our released data and researchers designing a similar task. Therefore, it does not only describe the annotation background and process but also unfolds the process of discussing and deciding on controversial cases in order to arrive at a reliable annotation standard. 1

[1]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[2]  Erhard W. Hinrichs,et al.  A Unified Representation for Morphological, Syntactic, Semantic, and Referential Annotations , 2005, FCA@ACL.

[3]  Arne Fitschen,et al.  Ein computerlinguistisches Lexikon als komplexes System , 2004 .

[4]  Gerhard Helbig,et al.  Wörterbuch zur Valenz und Distribution deutscher Verben [Helbig, 1969] , 1969 .

[5]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[6]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[7]  Michael Strube,et al.  Multi-Level Annotation in MMAX , 2003, SIGDIAL Workshop.

[8]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[9]  Richard Evans,et al.  Applying Machine Learning Toward an Automatic Classification of It , 2001, Lit. Linguistic Comput..

[10]  Gerhard Helbig,et al.  W? orterbuch zur Valenz und Distribution deutscher Verben , 1975 .

[11]  C. D. Paice,et al.  Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun “it” , 1987 .

[12]  Whitney Gegg-Harrison,et al.  Identifying Non-Referential it: A Machine Learning Approach Incorporating Linguistically Motivated Patterns , 2005, ACL 2005.

[13]  Paul Grebe,et al.  Duden Grammatik der deutschen Gegenwartssprache , 1973 .

[14]  Claudia Kunze,et al.  GermaNet - representation, visualization, application , 2002, LREC.

[15]  Judith Eckle-Kohler Linguistisches Wissen zur automatischen Lexikon-Akquisition aus deutschen Textcorpora , 1999 .