Towards Full Text Shallow Discourse Relation Annotation: Experiments with Cross-Paragraph Implicit Relations in the PDTB

Full text discourse parsing relies on texts comprehensively annotated with discourse relations. To this end, we address a significant gap in the inter-sentential discourse relations annotated in the Penn Discourse Treebank (PDTB), namely the class of cross-paragraph implicit relations, which account for 30% of inter-sentential relations in the corpus. We present our annotation study to explore the incidence rate of adjacent vs. non-adjacent implicit relations in cross-paragraph contexts, and the relative degree of difficulty in annotating them. Our experiments show a high incidence of non-adjacent relations that are difficult to annotate reliably, suggesting the practicality of backing off from their annotation to reduce noise for corpus-based studies. Our resulting guidelines follow the PDTB adjacency constraint for implicits while employing an underspecified representation of non-adjacent implicits, and yield 62% inter-annotator agreement on this task.

[1]  Rashmi Prasad,et al.  Annotating Discourse Connectives and Their Arguments , 2004, FCP@NAACL-HLT.

[2]  Bonnie L. Webber,et al.  Genre distinctions for discourse in the Penn TreeBank , 2009, ACL.

[3]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[4]  Alan Lee,et al.  A Discourse-Annotated Corpus of Conjoined VPs , 2016, LAW@ACL.

[5]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[6]  Rashmi Prasad,et al.  Exploiting Scope for Shallow Discourse Parsing , 2010, LREC.

[7]  Man Lan,et al.  A Refined End-to-End Discourse Parser , 2015, CoNLL Shared Task.

[8]  Ellen Riloff,et al.  Modeling Textual Cohesion for Event Extraction , 2012, AAAI.

[9]  Sasha J. Blair-Goldensohn,et al.  Long-answer question answering and rhetorical-semantic relations , 2007 .

[10]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[11]  Alan Lee,et al.  Annotating Discourse Relations with the PDTB Annotator , 2016, COLING.

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Swapna Somasundaran,et al.  Discourse Level Opinion Interpretation , 2008, COLING.

[14]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.

[15]  John A. Bateman,et al.  Rhetorical structure theory , 2006 .

[16]  Hwee Tou Ng,et al.  CoNLL 2016 Shared Task on Multilingual Shallow Discourse Parsing , 2016, CoNLL.

[17]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[18]  Manfred Stede,et al.  Discourse Processing , 2011, NAACL.

[19]  Matthew Stone,et al.  Anaphora and Discourse Structure , 2001, CL.

[20]  Ani Nenkova,et al.  Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[21]  Fan Zhang,et al.  Inferring Discourse Relations from PDTB-style Discourse Labels for Argumentative Revision Classification , 2016, COLING.

[22]  Rashmi Prasad,et al.  Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation , 2014, CL.

[23]  William C. Mann,et al.  Rhetorical Structure Theory: A Framework for the Analysis of Texts , 1987 .

[24]  Stephan Oepen,et al.  OPT: Oslo-Potsdam-Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing , 2016, CoNLL Shared Task.

[25]  Hwee Tou Ng,et al.  The CoNLL-2015 Shared Task on Shallow Discourse Parsing , 2015, CoNLL.

[26]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.