Linguistic Tests for Discourse Relations in the TüBa-D/Z Corpus of Written German

Discourse structure and discourse relations are an important ingredient in systems for the analysis of text that go beyond the boundary of single clauses. Discourse relations often indicate important additional information about the connection between two clauses, such as causality, and are widely believed to have an influence on aspects of reference resolution. More so than for referential annotation, discourse relation annotation is rendered difficult by the absence of a general consensus on the underlying linguistic phenomena that should be targeted, as well as by a lack of strong predictions on the possible or permissible interactions between these phenomena. While it is sometimes claimed that the structuring of discourse is only weakly constrained and as a result capturing discourse structure and discourse relations will always result in poor reproducibility of the annotation task, we want to argue in this paper that an explicit notion of the relata of discourse relations allows to delimit annotation scope and to make use of theoretical accounts of the linguistic phenomena involved without giving up the goal of theory-neutrality that is essential in making sure that a given resource stays useful to a large community of users. In this article, we first present the general design choices that are to be made in the design of an annotation scheme for discourse structure and discourse relations. In a second part, we present the scheme used in our annotation of selected articles from the TuBa-D/Z treebank of German (Telljohann et al., 2009). The scheme used in the annotation is theory-neutral, but informed by more detailed linguistic knowledge in the way of linguistic tests that can help disambiguate between several plausible relations.

[1]  Eve Sweetser From Etymology to Pragmatics: Notes , 1990 .

[2]  Johanna D. Moore,et al.  A Problem for RST: The Need for Multi-Level Discourse Analysis , 1992, CL.

[3]  Karin Naumann,et al.  Manual for the Annotation of in-document Referential Relations , 2006 .

[4]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[5]  Maria das Graças Volpe Nunes,et al.  On the Development and Evaluation of a Brazilian Portuguese Discourse Parser , 2008, RITA.

[6]  Leo G. M. Noordman,et al.  Toward a taxonomy of coherence relations , 1992 .

[7]  Matthew Stone,et al.  Anaphora and Discourse Structure , 2001, CL.

[8]  Craige Roberts Information structure in discourse: Towards an integrated for-mal theory of pragmatics , 1996 .

[9]  J. Hobbs On the coherence and structure of discourse , 1985 .

[10]  Eduard Hovy,et al.  Identity, non-identity, and near-identity: Addressing the complexity of coreference , 2011 .

[11]  J. V. Kuppevelt Discourse structure, topicality and questioning , 1995, Journal of Linguistics.

[12]  Alistair Knott,et al.  A data-driven methodology for motivating a set of coherence relations , 1996 .

[13]  Isabel Gómez Txurruka The Natural Language Conjunction And , 2003 .

[14]  Alex Lascarides,et al.  The Semantics and Pragmatics of Presupposition , 1998, J. Semant..

[15]  Manfred Stede,et al.  Does discourse processing need discourse topics? , 2004 .

[16]  D. Marcu,et al.  Experiments in Constructing a Corpus of Discourse Trees : Problems , Annotation Choices , Issues , 1999 .

[17]  Chris Mellish,et al.  Beyond Elaboration: The Interaction of Relations and Focus in Coherent Text , 2000 .

[18]  Gosse Bouma,et al.  Building a Discourse-annotated Dutch Text Corpus , 2011 .

[19]  Katja Markert,et al.  The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic , 2010, LREC.

[20]  Gerardo Sierra,et al.  On the Development of the RST Spanish Treebank , 2011, Linguistic Annotation Workshop.

[21]  Christopher Potts The logic of conventional implicatures , 2004 .

[22]  Mark Steedman,et al.  Information Structure and the Syntax-Phonology Interface , 2000, Linguistic Inquiry.

[23]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[24]  Kathrin Beck,et al.  Stylebook for the Tubingen Treebank of Written German (TuBa-D/Z) , 2012 .

[25]  Pascal Denis,et al.  Evidentiality and intensionality: Two uses of reportative constructions in discourse , 2006 .

[26]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[27]  Craige Roberts,et al.  Information Structure: Towards an integrated formal theory of pragmatics , 2012 .

[28]  Nicholas Asher,et al.  Discourse topic , 2004 .

[29]  Eva Hajicová,et al.  From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank , 2008, LREC.

[30]  Frank Schilder,et al.  Robust discourse parsing via discourse markers, topicality and position , 2002, Natural Language Engineering.

[31]  Harald Lüngen,et al.  Discourse Segmentation of German Written Texts , 2006, FinTAL.

[32]  Jennifer Spenader,et al.  Contrast as denial in multi-dimensional semantics , 2009 .

[33]  Ray Jackendoff,et al.  Semantic Interpretation in Generative Grammar , 1972 .

[34]  T. Sanders,et al.  Communicative intentions and coherence relations , 1999 .

[35]  Manfred Stede RST revisited : disentangling nuclearity , 2008 .

[36]  Yannick Versley,et al.  Vagueness and Referential Ambiguity in a Large-Scale Annotated Corpus , 2008 .

[37]  Annotation Data Manual for the Annotation of in-document Referential Relations , 2007 .

[38]  Luuk Lagerwerf Causal Connectives have Presuppositions , 1998 .

[39]  A. Knott,et al.  Using Linguistic Phenomena to Motivate a Set of Coherence Relations. , 1994 .

[40]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[41]  Stergos D. Afantenos,et al.  La ressource ANNODIS, un corpus enrichi d'annotations discursives , 2011 .

[42]  Bonnie L. Webber,et al.  Discourse Deixis: Reference to Discourse Segments , 1988, ACL.

[43]  Livio Robaldo,et al.  The Penn Discourse Treebank 2.0 Annotation Manual , 2007 .

[44]  Eduard Hovy,et al.  Parsimonious or Profligate: How Many and Which Discourse Structure Relations? , 1992 .

[45]  Lise Getoor,et al.  Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification , 2009, EMNLP.

[46]  Daniel Büring,et al.  On D-Trees, Beans, And B-Accents , 2003 .

[47]  Ekaterina Jasinskaja,et al.  Pragmatics and Prosody of Implicit Discourse Relations: The Case of Restatement , 2009 .

[48]  Maria Vilkuna,et al.  On Rheme and Kontrast , 1998 .

[49]  Rashmi Prasad,et al.  Annotation of Discourse Relations for Conversational Spoken Dialogs , 2010, LREC.

[50]  Robyn Carston,et al.  The pragmatics of sentential coordination with and. , 2005 .

[51]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[52]  Deniz Zeyrek,et al.  The Annotation Scheme of the Turkish Discourse Bank and an Evaluation of Inconsistent Annotations , 2010, Linguistic Annotation Workshop.

[53]  Bonnie L. Webber,et al.  D-LTAG: extending lexicalized TAG to discourse , 2004, Cogn. Sci..

[54]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[55]  L. Vieu,et al.  Subordinating and coordinating discourse relations , 2005 .

[56]  Remko Scha,et al.  A Syntactic Approach to Discourse Semantics , 1984, ACL.

[57]  Erick Galani Maziero,et al.  CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese , 2011 .

[58]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[59]  W. Mann,et al.  Rhetorical Structure Theory: looking back and moving ahead , 2006 .

[60]  Rashmi Prasad,et al.  Evaluation of Discourse Relation Annotation in the Hindi Discourse Relation Bank , 2012, LREC.

[61]  Nicholas Asher,et al.  A Formal Analysis of the French Temporal Connective 'alors' , 2009 .

[62]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[63]  Nicholas Asher,et al.  Reference Manual for the Analysis and Annotation of Rhetorical Structure , 2007 .

[64]  Ewald Lang,et al.  Adversative connectors on distinct levels of discourse: A re-examination of Eve Sweetser's three-level approach , 2000 .

[65]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .