Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation

The Penn Discourse Treebank (PDTB) was released to the public in 2008. It remains the largest manually annotated corpus of discourse relations to date. Its focus on discourse relations that are either lexically-grounded in explicit discourse connectives or associated with sentential adjacency has not only facilitated its use in language technology and psycholinguistics but also has spawned the annotation of comparable corpora in other languages and genres.Given this situation, this paper has four aims: (1) to provide a comprehensive introduction to the PDTB for those who are unfamiliar with it; (2) to correct some wrong (or perhaps inadvertent) assumptions about the PDTB and its annotation that may have weakened previous results or the performance of decision procedures induced from the data; (3) to explain variations seen in the annotation of comparable resources in other languages and genres, which should allow developers of future comparable resources to recognize whether the variations are relevant to them; and (4) to enumerate and explain relationships between PDTB annotation and complementary annotation of other linguistic phenomena. The paper draws on work done by ourselves and others since the corpus was released.

[1]  Hong Yu,et al.  The biomedical discourse relation bank , 2011, BMC Bioinformatics.

[2]  Thomas Meyer,et al.  Disambiguating temporal-contrastive connectives for machine translation , 2011, ACL.

[3]  Himanshu Sharma,et al.  Assessment of Different Workflow Strategies for Annotating Discourse Relations: A Case Study with HDRB , 2013, CICLing.

[4]  Jiang Xi Predicting the use and interpretation of implicit and explicit discourse connectives , 2013 .

[5]  Deniz Zeyrek,et al.  The Annotation Scheme of the Turkish Discourse Bank and an Evaluation of Inconsistent Annotations , 2010, Linguistic Annotation Workshop.

[6]  Rashmi Prasad,et al.  Annotating Discourse Connectives and Their Arguments , 2004, FCP@NAACL-HLT.

[7]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[8]  Avlína,et al.  MANUAL FOR ANNOTATION OF DISCOURSE RELATIONS IN THE PRAGUE DEPENDENCY TREEBANK , 2012 .

[9]  Vera Demberg,et al.  Implicitness of Discourse Relations , 2012, COLING.

[10]  Alistair Knott,et al.  A data-driven methodology for motivating a set of coherence relations , 1996 .

[11]  Yuping Zhou,et al.  PDTB-style Discourse Annotation of Chinese Text , 2012, ACL.

[12]  Rashmi Prasad A Discourse-based Approach to Generating Why-Questions from Texts , 2008 .

[13]  Rashmi Prasad,et al.  Evaluation of Discourse Relation Annotation in the Hindi Discourse Relation Bank , 2012, LREC.

[14]  Ani Nenkova,et al.  Easily Identifiable Discourse Relations , 2008, COLING.

[15]  Richard Johansson,et al.  Improving the Recall of a Discourse Parser by Constraint-based Postprocessing , 2012, LREC.

[16]  Yannick Versley,et al.  Discovery of Ambiguous and Unambiguous Discourse Connectives via Annotation Projection , 2010 .

[17]  Julia Hirschberg,et al.  Empirical Studies on the Disambiguation of Cue Phrases , 1993, Comput. Linguistics.

[18]  Hwee Tou Ng,et al.  A PDTB-styled end-to-end discourse parser , 2012, Natural Language Engineering.

[19]  Livio Robaldo,et al.  The Penn Discourse Treebank 2.0 Annotation Manual , 2007 .

[20]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[21]  Bonnie L. Webber,et al.  Computing Discourse Semantics: The Predicate-Argument Semantics of Discourse Connectives in D-LTAG , 2005, J. Semant..

[22]  Rashmi Prasad,et al.  Experiments with Annotating Discourse Relations in the Hindi Discourse Relation Bank , 2009 .

[23]  Christopher Culy,et al.  Sentential Structure and Discourse Parsing , 2004, ACL 2004.

[24]  Nicholas Asher,et al.  Annotation for and Robust Parsing of Discourse Structure on Unrestricted Texts , 2007 .

[25]  James Pustejovsky,et al.  Sequence models and ranking methods for discourse parsing , 2009 .

[26]  Vera Demberg,et al.  On the Information Conveyed by Discourse Markers , 2013, CMCL.

[27]  Deniz Zeyrek,et al.  Turkish Discourse Bank: Porting a discourse annotation style to a morphologically rich language , 2013, Dialogue Discourse.

[28]  Harry Bunt,et al.  First steps towards an ISO standard for annotating discourse relations , 2012 .

[29]  Chris Mellish,et al.  Beyond Elaboration: The Interaction of Relations and Focus in Coherent Text , 2000 .

[30]  Silvia Pareti,et al.  A Database of Attribution Relations , 2012, LREC.

[31]  Alan Lee,et al.  Attribution and the (Non-)Alignment of Syntactic and Discourse Arguments of Connectives , 2005, FCA@ACL.

[32]  Andrew Kehler,et al.  Coherence, reference, and the theory of grammar , 2002, CSLI lecture notes series.

[33]  C. Fabricius-Hansen,et al.  "Subordination" versus "coordination" in sentence and text : a cross-linguistic perspective , 2008 .

[34]  Mark Steedman,et al.  Temporal Ontology and Temporal Reference , 1988, CL.

[35]  Amal Alsaif Human and automatic annotation of discourse relationsfor Arabic , 2012 .

[36]  Andrei Popescu-Belis,et al.  Using Sense-labeled Discourse Connectives for Statistical Machine Translation , 2012, ESIRMT/HyTra@EACL.

[37]  Rashmi Prasad,et al.  Exploiting Scope for Shallow Discourse Parsing , 2010, LREC.

[38]  Laurence Danlos,et al.  Vers le FDTB : French Discourse Tree Bank (Towards the FDTB : French Discourse Tree Bank) [in French] , 2012, JEP/TALN/RECITAL.

[39]  Deniz Zeyrek,et al.  Discourse Relation Configurations in Turkish and an Annotation Environment , 2010, Linguistic Annotation Workshop.

[40]  Magdalena Rysova Alternative Lexicalizations of Discourse Connectives in Czech , 2012, LREC.

[41]  Vera Demberg,et al.  Measuring the Strength of Linguistic Cues for Discourse Relations , 2012 .

[42]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[43]  James Pustejovsky,et al.  Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank and Coreference , 2005, FCA@ACL.

[44]  Eve Sweetser,et al.  From Etymology to Pragmatics: Preface , 1990 .

[45]  Alan Lee,et al.  Attribution and its annotation in the Penn Discourse TreeBank , 2006, Trait. Autom. des Langues.

[46]  Hong Yu,et al.  Automatic discourse connective detection in biomedical text , 2012, J. Am. Medical Informatics Assoc..

[47]  Yuping Zhou,et al.  The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations , 2015, Lang. Resour. Evaluation.

[48]  Bonnie L. Webber,et al.  Discourse structure and language technology , 2011, Natural Language Engineering.

[49]  Rashmi Prasad,et al.  The Hindi Discourse Relation Bank , 2009, Linguistic Annotation Workshop.

[50]  Richard Johansson,et al.  Shallow Discourse Parsing with Conditional Random Fields , 2011, IJCNLP.

[51]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[52]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[53]  Bonnie L. Webber,et al.  What excludes an Alternative in Coherence Relations? , 2013, IWCS.

[54]  Eva Hajicová,et al.  Introducing the Prague Discourse Treebank 1.0 , 2013, IJCNLP.

[55]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[56]  Livio Robaldo,et al.  Sense Annotation in the Penn Discourse Treebank , 2008, CICLing.

[57]  Ruken Cakici,et al.  Annotating Subordinators in the Turkish Discourse Bank , 2009, Linguistic Annotation Workshop.

[58]  Rashmi Prasad,et al.  Realization of Discourse Relations by Other Means: Alternative Lexicalizations , 2010, COLING.

[59]  K. Aijmer,et al.  A model and a methodology for the study of pragmatic markers: the semantic field of expectation , 2004 .

[60]  Andrew Kehler,et al.  Predicting the Presence of Discourse Connectives , 2013, EMNLP.

[61]  Pavlína Jínová,et al.  Semi-Automatic Annotation of Intra-Sentential Discourse Relations in PDT , 2012 .

[62]  Jian Su,et al.  Predicting Discourse Connectives for Implicit Discourse Relation Recognition , 2010, COLING.

[63]  Katja Markert,et al.  The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic , 2010, LREC.

[64]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[65]  Bonnie Webber,et al.  Implicitation of Discourse Connectives in (Machine) Translation , 2013, DiscoMT@ACL.

[66]  Manish Agarwal,et al.  Automatic Question Generation using Discourse Cues , 2011, BEA@ACL.

[67]  Thomas Meyer,et al.  Disambiguating temporal-contrastive connectives for machine translation , 2011, ACL.

[68]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[69]  James Pustejovsky,et al.  Automatically Identifying the Arguments of Discourse Connectives , 2007, EMNLP.

[70]  Eva Hajicová,et al.  From Sentence to Discourse: Building an Annotation Scheme for Discourse Based on Prague Dependency Treebank , 2008, LREC.

[71]  Manfred Stede,et al.  Discourse Processing , 2011, NAACL.

[72]  Manfred Stede RST revisited : disentangling nuclearity , 2008 .

[73]  Ani Nenkova,et al.  Using Syntax to Disambiguate Explicit Discourse Connectives in Text , 2009, ACL.

[74]  B. Webber,et al.  Experiments on Sense Annotations and Sense Disambiguation of Discourse Connectives , 2005 .

[75]  Michael Halliday,et al.  Cohesion in English , 1976 .

[76]  Deniz Zeyrek,et al.  Applicative Structures and Immediate Discourse in the Turkish Discourse Bank , 2013, LAW@ACL.

[77]  Nianwen Xue,et al.  Annotating Discourse Connectives in the Chinese Treebank , 2005, FCA@ACL.

[78]  Jason Baldridge,et al.  Discourse Connective Argument Identification with Connective Specific Rankers , 2008, 2008 IEEE International Conference on Semantic Computing.

[79]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[80]  Richard Johansson,et al.  End-to-End Discourse Parser Evaluation , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[81]  Corinne Rossari,et al.  Pragmatic Connectives as Predicates. The Case of Inferential Connectives , 1999 .

[82]  Katja Markert,et al.  Modelling Discourse Relations for Arabic , 2011, EMNLP.

[83]  Jeannett Martin,et al.  English Text: System and structure , 1992 .