How compatible are our discourse annotations? Insights from mapping RST-DT and PDTB annotations

Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks. This makes comparison of the annotations difficult, thereby also preventing researchers from searching the corpora in a unified way, or using all annotated data jointly to train computational systems. Several theoretical proposals have recently been made for mapping the relational labels of different frameworks to each other, but these proposals have so far not been validated against existing annotations. The two largest discourse relation annotated resources, the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank, have however been annotated on the same text, allowing for a direct comparison of the annotation layers. We propose a method for automatically aligning the discourse segments, and then evaluate existing mapping proposals by comparing the empirically observed against the proposed mappings. Our analysis highlights the influence of segmentation on subsequent discourse relation labeling, and shows that while agreement between frameworks is reasonable for explicit relations, agreement on implicit relations is low. We identify several sources of systematic discrepancies between the two annotation schemes and discuss consequences of these discrepancies for future annotation and for the training of automatic discourse relation labellers.

[1]  Johanna D. Moore,et al.  A Problem for RST: The Need for Multi-Level Discourse Analysis , 1992, CL.

[2]  Manfred Stede,et al.  Potsdam Commentary Corpus 2.0: Annotation for Discourse Research , 2014, LREC.

[3]  Rashmi Prasad,et al.  Complexity of Dependencies in Discourse: Are Dependencies in Discourse More Complex than in Syntax? , 2006 .

[4]  Jet Hoek,et al.  Segmenting discourse: Incorporating interpretation into segmentation? , 2018 .

[5]  Livio Robaldo,et al.  The Penn Discourse Treebank 2.0 Annotation Manual , 2007 .

[6]  Alan Lee,et al.  Towards Full Text Shallow Discourse Relation Annotation: Experiments with Cross-Paragraph Implicit Relations in the PDTB , 2017, SIGDIAL Conference.

[7]  Peter Jansen,et al.  Discourse Complements Lexical Semantics for Non-factoid Answer Reranking , 2014, ACL.

[8]  Rashmi Prasad,et al.  The Hindi Discourse Relation Bank , 2009, Linguistic Annotation Workshop.

[9]  Hwee Tou Ng,et al.  CoNLL 2016 Shared Task on Multilingual Shallow Discourse Parsing , 2016, CoNLL.

[10]  Maite Taboada,et al.  Mapping Different Rhetorical Relation Annotations: A Proposal , 2015, *SEM@NAACL-HLT.

[11]  Manfred Stede RST revisited : disentangling nuclearity , 2008 .

[12]  Heiner Stuckenschmidt,et al.  Fine-Grained Sentiment Analysis with Structural Features , 2011, IJCNLP.

[13]  Vera Demberg,et al.  Examples and Specifications that Prove a Point: Identifying Elaborative and Argumentative Discourse Relations , 2017, Dialogue Discourse.

[14]  Andrei Popescu-Belis,et al.  Using Sense-labeled Discourse Connectives for Statistical Machine Translation , 2012, ESIRMT/HyTra@EACL.

[15]  Andrei Popescu-Belis Manual and automatic labeling of discourse connectives for machine translation (Keynote paper) , 2016 .

[16]  Wei Gao,et al.  Unsupervised Discovery of Discourse Relations for Eliminating Intra-sentence Polarity Ambiguities , 2011, EMNLP.

[17]  D. Blakemore Restatement and exemplification: A relevance theoretic reassessment of elaboration , 1997 .

[18]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[19]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[20]  Christian Chiarcos Towards interoperable discourse annotation. Discourse features in the Ontologies of Linguistic Annotation , 2014, LREC.

[21]  Manfred Stede,et al.  Mapping PDTB-style connective annotation to RST-style discourse annotation , 2016, KONVENS.

[22]  Ines Rehbein,et al.  Annotating Discourse Relations in Spoken Language: A Comparison of the PDTB and CCR Frameworks , 2016, LREC.

[23]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[24]  R. Carston Conjunction, explanation and relevance , 1993 .

[25]  Leo G. M. Noordman,et al.  Toward a taxonomy of coherence relations , 1992 .

[26]  G. Redeker Ideational and pragmatic markers of discourse structure , 1990 .

[27]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[28]  Alan Lee,et al.  A Discourse-Annotated Corpus of Conjoined VPs , 2016, LAW@ACL.

[29]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.

[30]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[31]  Peter Jansen,et al.  Spinning Straw into Gold: Using Free Text to Train Monolingual Alignment Models for Non-factoid Question Answering , 2015, HLT-NAACL.

[32]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[33]  Livio Robaldo,et al.  Corpus-driven Semantics of Concession: Where do Expectations Come from? , 2014, Dialogue Discourse.

[34]  Hwee Tou Ng,et al.  The CoNLL-2015 Shared Task on Shallow Discourse Parsing , 2015, CoNLL.

[35]  Harry Bunt,et al.  ISO DR-Core (ISO 24617-8): Core Concepts for the Annotation of Discourse Relations , 2016, ACL 2016.

[36]  Sandrine Zufferey,et al.  Annotating the meaning of discourse connectives in multilingual corpora , 2017 .

[37]  Rashmi Prasad,et al.  Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation , 2014, CL.

[38]  Eduard Hovy,et al.  Parsimonious or Profligate: How Many and Which Discourse Structure Relations? , 1992 .

[39]  Lise Getoor,et al.  Supervised and Unsupervised Methods in Employing Discourse Relations for Improving Opinion Polarity Classification , 2009, EMNLP.

[40]  Vera Demberg,et al.  On the Information Conveyed by Discourse Markers , 2013, CMCL.