A Recursive Annotation Scheme for Referential Information Status

We provide a robust and detailed annotation scheme for information status, which is easy to use, follows a semantic rather than cognitive motivation, and achieves reasonable inter-annotator scores. Our annotation scheme is based on two main assumptions: firstly, that information status strongly depends on (in)definiteness, and secondly, that it ought to be understood as a property of referents rather than words. Therefore, our scheme banks on overt (in)definiteness marking and provides different categories for each class. Definites are grouped according to the information source by which the referent is identified. A special aspect of the scheme is that non-anaphoric expressions (e.g.\ names) are classified as to whether their referents are likely to be known or unknown to an expected audience. The annotation scheme provides a solution for annotating complex nominal expressions which may recursively contain embedded expressions. In annotating a corpus of German radio news bulletins, a kappa score of .66 for the full scheme was achieved, a core scheme of six top-level categories yields kappa = .78.

[1]  Aoife Cahill,et al.  Incorporating Information Status into Generation Ranking , 2009, ACL/IJCNLP.

[2]  Laurence R. Horn,et al.  The handbook of pragmatics , 2004 .

[3]  Arndt Riester,et al.  A Semantic Explication of Information Status and the Underspecification of the Recipients' Knowledge , 2008 .

[4]  Uwe Reyle,et al.  From Discourse to Logic - Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory , 1993, Studies in linguistics and philosophy.

[5]  Jeanette K. Gundel,et al.  Cognitive Status and the Form of Referring Expressions in Discourse , 1993 .

[6]  Stavros Skopeteas,et al.  Information Structure in Cross-Linguistic Corpora: , 2007 .

[7]  Katrin Erk,et al.  SALTO - A Versatile Multi-Level Annotation Tool , 2006, LREC.

[8]  Arndt Riester,et al.  Partial Accommodation and Activation in Definites , 2008 .

[9]  Stefanie Dipper,et al.  Annotation of Information Structure: an Evaluation across different Types of Texts , 2008, LREC.

[10]  Petr Pajas,et al.  The Coding Scheme for Annotating Extended Nominal Coreference and Bridging Anaphora in the Prague Dependency Treebank , 2009, Linguistic Annotation Workshop.

[11]  Massimo Poesio,et al.  The MATE/GNOME Proposals for Anaphoric Annotation, Revisited , 2004, SIGDIAL Workshop.

[12]  Ellen F. Prince,et al.  Toward a taxonomy of given-new information , 1981 .

[13]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[14]  Stefan Baumann,et al.  Annotating Information Status in Spontaneous Speech , 2010 .

[15]  Arthur C. Graesser,et al.  Using LSA to Automatically Identify Givenness and Newness of Noun Phrases in Written Discourse , 2005 .

[16]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[17]  Mark Steedman,et al.  An Annotation Scheme for Information Status in Dialogue , 2004, LREC.

[18]  Herbert H. Clark,et al.  Bridging , 1975, TINLAP.

[20]  Renata Vieira,et al.  A Corpus-based Investigation of Definite Description Use , 1997, CL.

[21]  E. Prince The ZPG Letter: Subjects, Definiteness, and Information-status , 1992 .

[22]  Uwe Reyle,et al.  From discourse to logic , 1993 .

[23]  Michael Walsh,et al.  Pitch accents and information status in a German radio news corpus , 2009, INTERSPEECH.

[24]  Charles J. Fillmore,et al.  Frames and the semantics of understanding , 1985 .

[25]  David Kaplan Demonstratives: An Essay on the Semantics, Logic, Metaphysics and Epistemology of Demonstratives and other Indexicals , 1989 .