How consistent are our discourse annotations? Insights from mapping RST-DT and PDTB annotations

Discourse-annotated corpora are an important resource for the community. However, these corpora are often annotated according to different frameworks, making comparison of the annotations difficult. This is unfortunate, since mapping the existing annotations would result in more (training) data for researchers in automatic discourse relation processing and researchers in linguistics and psycholinguistics. In this article, we present an effort to map two large corpora onto each other: the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank. We first propose a method for aligning the discourse segments, and then evaluate the observed against the expected mappings for explicit and implicit relations separately. We find that while agreement on explicit relations is reasonable, agreement between the frameworks on implicit relations is astonishingly low. We identify sources of systematic discrepancies between the two annotation schemes; many of the differences in annotation can be traced back to different operationalizations and goals of the PDTB and RST frameworks. We discuss the consequences of these discrepancies for future annotation, and the usability of the mapped data for theoretical studies and the training of automatic discourse relation labellers.

[1]  Sandrine Zufferey,et al.  Annotating the meaning of discourse connectives in multilingual corpora , 2017 .

[2]  Vera Demberg,et al.  On the Information Conveyed by Discourse Markers , 2013, CMCL.

[3]  Harry Bunt,et al.  ISO DR-Core (ISO 24617-8): Core Concepts for the Annotation of Discourse Relations , 2016, ACL 2016.

[4]  Ines Rehbein,et al.  Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks , 2016, LREC.

[5]  Andrei Popescu-Belis,et al.  Using Sense-labeled Discourse Connectives for Statistical Machine Translation , 2012, ESIRMT/HyTra@EACL.

[6]  Andrei Popescu-Belis Manual and automatic labeling of discourse connectives for machine translation (Keynote paper) , 2016 .

[7]  R. Carston Conjunction, explanation and relevance , 1993 .

[8]  S. Zufferey,et al.  The Role of Perspective Shifts for Processing and Translating Discourse Relations , 2016 .

[9]  G. Redeker Ideational and pragmatic markers of discourse structure , 1990 .

[10]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[11]  Alan Lee,et al.  Towards Full Text Shallow Discourse Relation Annotation: Experiments with Cross-Paragraph Implicit Relations in the PDTB , 2017, SIGDIAL Conference.

[12]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[13]  Alan Lee,et al.  A Discourse-Annotated Corpus of Conjoined VPs , 2016, LAW@ACL.

[14]  Yasuko Obana,et al.  Co-authorship of Joint utterances in Japanese , 2015, Dialogue Discourse.

[15]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[16]  Peter Jansen,et al.  Discourse Complements Lexical Semantics for Non-factoid Answer Reranking , 2014, ACL.

[17]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[18]  Heiner Stuckenschmidt,et al.  Fine-Grained Sentiment Analysis with Structural Features , 2011, IJCNLP.

[19]  Peter Jansen,et al.  Spinning Straw into Gold: Using Free Text to Train Monolingual Alignment Models for Non-factoid Question Answering , 2015, HLT-NAACL.

[20]  Leo G. M. Noordman,et al.  Toward a taxonomy of coherence relations , 1992 .

[21]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[22]  Livio Robaldo,et al.  Corpus-driven Semantics of Concession: Where do Expectations Come from? , 2014, Dialogue Discourse.

[23]  Hwee Tou Ng,et al.  The CoNLL-2015 Shared Task on Shallow Discourse Parsing , 2015, CoNLL.

[24]  Livio Robaldo,et al.  The Penn Discourse Treebank 2.0 Annotation Manual , 2007 .

[25]  Fatemeh Torabi Asr,et al.  Uniform Information Density at the Level of Discourse Relations: Negation Markers and Discourse Connective Omission , 2015 .

[26]  Vera Demberg,et al.  Examples and Specifications that Prove a Point: Identifying Elaborative and Argumentative Discourse Relations , 2017, Dialogue Discourse.

[27]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[28]  Eduard Hovy,et al.  Parsimonious or Profligate: How Many and Which Discourse Structure Relations? , 1992 .

[29]  Lise Getoor,et al.  Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification , 2009, EMNLP.

[30]  Wei Gao,et al.  Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities , 2011, EMNLP.

[31]  Manfred Stede RST revisited : disentangling nuclearity , 2008 .

[32]  Christian Chiarcos Towards interoperable discourse annotation. Discourse features in the Ontologies of Linguistic Annotation , 2014, LREC.

[33]  Manfred Stede,et al.  Mapping PDTB-style connective annotation to RST-style discourse annotation , 2016, KONVENS.

[34]  Rashmi Prasad,et al.  Complexity of Dependencies in Discourse: Are Dependencies in Discourse More Complex than in Syntax? , 2006 .

[35]  Maite Taboada,et al.  Mapping Different Rhetorical Relation Annotations: A Proposal , 2015, *SEM@NAACL-HLT.

[36]  Johanna D. Moore,et al.  A Problem for RST: The Need for Multi-Level Discourse Analysis , 1992, CL.

[37]  Manfred Stede,et al.  Potsdam Commentary Corpus 2.0: Annotation for Discourse Research , 2014, LREC.

[38]  Rashmi Prasad,et al.  The Hindi Discourse Relation Bank , 2009, Linguistic Annotation Workshop.

[39]  Hwee Tou Ng,et al.  CoNLL 2016 Shared Task on Multilingual Shallow Discourse Parsing , 2016, CoNLL.

[40]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.