A step-wise approach to discourse annotation : Towards a reliable categorization of coherence relations

Over the last decennia, annotating discourse coherence relations has gained increasing interest of the linguistics research community. Because of the complexity of coherence relations, there is no agreement on an annotation standard. Current annotation methods often lack a systematic order of coherence relations. In this article, we investigate the usability of the cognitive approach to coherence relations, developed by Sanders et al. (1992, 1993), for discourse annotation. The theory proposes a taxonomy of coherence relations in terms of four cognitive primitives. In this paper, we first develop a systematic, step-wise annotation process. The reliability of this annotation scheme is then tested in an annotation experiment with non-trained, non-expert annotators. An implicit and explicit version of the annotation instruction was created to determine whether the type of instruction influences the annotator agreement. The results show that two of the four primitives, polarity and order of the segments, can be applied reliably by non-trained annotators. The other two primitives, basic operation and source of coherence, are more problematic. Participants using the explicit instruction show higher agreement on the primitives than participants used the implicit instruction. These results are comparable to agreement statistics of other discourse corpora annotated by trained, expert annotators. Given that non-trained, non-expert annotators show similar amounts of agreement, these results indicate that the cognitive approach to coherence relations is a promising method for annotating discourse.

[1]  Rashmi Prasad,et al.  Annotating Discourse Connectives and Their Arguments , 2004, FCP@NAACL-HLT.

[2]  Kai Ming Ting Precision and Recall , 2017, Encyclopedia of Machine Learning and Data Mining.

[3]  Omar Alonso,et al.  Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..

[4]  T. Sanders,et al.  Causal connectives in discourse processing: How differences in subjectivity are reflected in eye movements , 2013 .

[5]  T. Sanders,et al.  The acquisition order of coherence relations : On cognitive complexity in discourse , 2008 .

[6]  Liesbeth Degand,et al.  A contrastive study of Dutch and French causal connectives on the speaker involvement scale , 2003 .

[7]  Gosse Bouma,et al.  Building a Discourse-annotated Dutch Text Corpus , 2011 .

[8]  A. Sanford,et al.  Processing causal and diagnostic statements in discourse , 1997 .

[9]  Yannick Versley,et al.  Linguistic Tests for Discourse Relations in the TüBa-D/Z Corpus of Written German , 2012 .

[10]  L. Bloom,et al.  Complex sentences: acquisition of syntactic connectives and the semantic relations they encode , 1980, Journal of Child Language.

[11]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[12]  Thorsten Brants,et al.  Inter-annotator Agreement for a German Newspaper Corpus , 2000, LREC.

[13]  Michael Halliday,et al.  Cohesion in English , 1976 .

[14]  T. Sanders,et al.  The emergence of Dutch connectives; how cumulative cognitive complexity explains the order of acquisition* , 2008, Journal of Child Language.

[15]  Katja Markert,et al.  The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic , 2010, LREC.

[16]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[17]  Ron Artstein,et al.  The Reliability of Anaphoric Annotation, Reconsidered: Taking Ambiguity into Account , 2005, FCA@ACL.

[18]  Jacqueline Evers-Vermeul,et al.  Subjectivity and result marking in Mandarin : A corpus-based investigation , 2013 .

[19]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[20]  T. Sanders Project notes of CLARIN project DiscAn : Towards a Discourse Annotation system for Dutch language corpora , 2012 .

[21]  Alex Lascarides,et al.  Logics of Conversation , 2005, Studies in natural language processing.

[22]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[23]  Liesbeth Degand,et al.  Form and Function of Causation: A Theoretical and Empirical Investigation of Causal Constructions in Dutch , 2001 .

[24]  Ted Sanders,et al.  The Role of Coherence Relations and Their Linguistic Markers in Text Processing , 2000 .

[25]  Susan Conrad,et al.  4. CORPUS LINGUISTIC APPROACHES FOR DISCOURSE ANALYSIS , 2002, Annual Review of Applied Linguistics.

[26]  T. Sanders Semantic and pragmatic sources of coherence: On the categorization of coherence relations in context , 1997 .

[27]  A. Knott,et al.  Using Linguistic Phenomena to Motivate a Set of Coherence Relations. , 1994 .

[28]  J. Hobbs On the coherence and structure of discourse , 1985 .

[29]  Oswald Ducrot Pragmatique Linguistique: II. Essai d’application: mais – les allusions à l’énonciation – délocutifs, performatifs, discours indirect , 1980 .

[30]  H. Maat,et al.  Domains of use or subjectivity? The distribution of three Dutch causal connectives explained , 2000 .

[31]  Ani Nenkova,et al.  Using Syntax to Disambiguate Explicit Discourse Connectives in Text , 2009, ACL.

[32]  Liesbeth Degand,et al.  Coding coherence relations: Reliability and validity , 2010 .

[33]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[34]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[35]  Sandrine Zufferey,et al.  “Car, parce que, puisque” revisited: Three empirical studies on French causal connectives , 2012 .

[36]  M. Pickering,et al.  Influence of Connectives on Language Comprehension: Eye tracking Evidence for Incremental Interpretation , 1997 .

[37]  Andrew Kehler,et al.  Coherence, reference, and the theory of grammar , 2002, CSLI lecture notes series.

[38]  Leo G. M. Noordman,et al.  Memory-based processing in understanding causal information , 1998 .

[39]  Leo G. M. Noordman,et al.  Coherence relations in a cognitive theory of discourse representation , 1993 .

[40]  Leo G. M. Noordman,et al.  Toward a taxonomy of coherence relations , 1992 .

[41]  Wilbert Spooren,et al.  Causal categories in discourse: Converging evidence from language use , 2009 .

[42]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[43]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[44]  T. Sanders,et al.  Causality and subjectivity in discourse: The meaning and use of causal connectives in spontaneous conversation, chat interactions and written text , 2014 .

[45]  T. Sanders,et al.  Subjectivity and prototype structure in causal connectives: a cross-linguistic perspective , 2012 .

[46]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[47]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[48]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[49]  Arie Verhagen,et al.  Causality in verbs and in discourse connectives : Converging evidence of cross-level parallels in Dutch linguistic categorization , 2008 .

[50]  M. Pit,et al.  Cross-linguistic analyses of backward causal connectives in Dutch, German and French , 2007 .

[51]  J. Evers-Vermeul The development of Dutch connectives : change and acquisition as windows on form-function relations , 2005 .

[52]  Harry Bunt,et al.  Semantic Relations in Discourse: The Current State of ISO 24617-8 , 2015, ACL 2015.

[53]  Ewald Lang,et al.  The semantics of coordination , 1984 .

[54]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[55]  Leo G. M. Noordman,et al.  On the processing of causal relations , 2000 .

[56]  T. Sanders,et al.  The classification of coherence relations and their linguistic markers: An exploration of two languages , 1998 .