Building a Discourse-annotated Dutch Text Corpus

We are compiling a corpus of Dutch texts annotated with discourse structure and lexical cohesion, containing initially 80 texts from expository and persuasive genres. We are using this resource for corpus-based studies of discourse relations, discourse markers, cohesion, and genre differences. We are also exploring the possibilities of automatic text segmentation and semi-automatic discourse annotation. This paper discusses our design choices in text selection and segmentation and in the annotation of discourse structure and lexical cohesion.

[1]  D. Marcu,et al.  Experiments in Constructing a Corpus of Discourse Trees : Problems , Annotation Choices , Issues , 1999 .

[2]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[3]  Judith Kamalski,et al.  Coherence Marking, Comprehension and Persuasion. On the processing and representation of discourse , 2007 .

[4]  T. Virtanen,et al.  Persuasion Across Genres: A linguistic approach , 2005 .

[5]  Nancy Chinchor,et al.  Message Understanding Conference (MUC) Tests of Discourse Processing , 1995 .

[6]  Leo G. M. Noordman,et al.  Coherence relations in a cognitive theory of discourse representation , 1993 .

[7]  Leo G. M. Noordman,et al.  Toward a taxonomy of coherence relations , 1992 .

[8]  Michael Strube,et al.  MMAX: A Tool for the Annotation of Multi-modal Corpora , 2001, IJCAI 2001.

[9]  Laurence Danlos,et al.  Discourse Dependency Structures as Constrained DAGs , 2004, SIGDIAL Workshop.

[10]  David Reitter,et al.  Simple Signals for Complex Rhetorics: On Rhetorical Analysis with Rich-Feature Support Vector Models , 2003, LDV Forum.

[11]  Michael Halliday,et al.  Cohesion in English , 1976 .

[12]  Manfred Stede,et al.  Bei: Intraclausal Coherence Relations Illustrated With a German Preposition , 2006 .

[13]  M. Halliday,et al.  AN INTRODUCTION TO FUNCTIONAL GRAMMAR (Third Edition) , 2022 .

[14]  James Flood Understanding Reading Comprehension: Cognition, Language, and the Structure of Prose. , 1984 .

[15]  Thomas A. Upton,et al.  Understanding Direct Mail Letters as a genre , 2002 .

[16]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[17]  N. H. van der Vliet,et al.  Syntax-based Discourse Segmentation of Dutch Text , 2010 .

[18]  Vijay K. Bhatia,et al.  Generic patterns in promotional discourse , 2005 .

[19]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[20]  Gisela Redeker,et al.  Says who? On the treatment of speech attributions in discourse structure , 2006 .

[21]  Andrew Kehler,et al.  Coherence, reference, and the theory of grammar , 2002, CSLI lecture notes series.

[22]  W. Mann,et al.  Rhetorical Structure Theory: looking back and moving ahead , 2006 .

[23]  B. Webber Accounting for Discourse Relations: Constituency and Dependency , 2006 .

[24]  T. V. Dijk News as Discourse , 1990 .

[25]  Bonnie L. Webber,et al.  Genre distinctions for discourse in the Penn TreeBank , 2009, ACL.

[26]  Rashmi Prasad,et al.  Departures from Tree Structures in Discourse: Shared Arguments in the Penn Discourse TreeBank , 2008 .

[27]  D. Biber,et al.  Book Review: Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure by Douglas Biber, Ulla Connor, and Thomas A. Upton , 2007, CL.

[28]  Markus Egg,et al.  How Complex is Discourse Structure? , 2010, LREC.

[29]  Bonnie L. Webber,et al.  D-LTAG: extending lexicalized TAG to discourse , 2004, Cogn. Sci..

[30]  Nicholas Asher Troubles on the right frontier , 2008 .

[31]  Sandra A. Thompson,et al.  The rhetorical structure of US-American and Dutch fund-raising letters , 1993 .

[32]  Harald Lüngen,et al.  Discourse Segmentation of German Written Texts , 2006, FinTAL.

[33]  Sanna-Kaisa Tanskanen Collaborating Towards Coherence: Lexical Cohesion in English Discourse , 2006 .

[34]  T. Sanders,et al.  The classification of coherence relations and their linguistic markers: An exploration of two languages , 1998 .

[35]  Martin van den Berg,et al.  A Rule Based Approach to Discourse Parsing , 2004, SIGDIAL Workshop.

[36]  Christopher Culy,et al.  LiveTree: An Integrated Workbench for Discourse Processing , 2004, ACL 2004.

[37]  Markus Egg,et al.  Underspecified discourse representation , 2005 .

[38]  Vijay K. Bhatia Generic patterns in fundraising discourse , 1998 .

[39]  Maite Taboada,et al.  A Syntactic and Lexical-Based Discourse Segmenter , 2009, ACL.

[40]  Thomas A. Upton,et al.  An approach to corpus-based discourse analysis: The move analysis as example , 2009 .

[41]  Michael Hoey,et al.  Patterns of Lexis In Text , 1991 .

[42]  M. Taboada,et al.  DISCOURSE MARKERS AS SIGNALS (OR NOT) OF RHETORICAL RELATIONS , 2006 .

[43]  Jacques Terken,et al.  Reliability of discourse structure annotation , 1998 .

[44]  Maite Taboada,et al.  Applications of Rhetorical Structure Theory , 2006 .

[45]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[46]  I. Berzlánovich,et al.  Genre-dependent interaction of coherence and lexical cohesion in written discourse , 2012 .

[47]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[48]  Gisela Redeker,et al.  Same and Elaboration Relations in the Discourse Graphbank , 2010, SIGDIAL Conference.

[49]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[50]  Matthew Stone,et al.  Anaphora and Discourse Structure , 2001, CL.

[51]  Daniel Hardt,et al.  Syntactic Identification of Attribution in the RST Treebank , 2005, IJCNLP.

[52]  Manfred Stede,et al.  Disambiguating Rhetorical Structure , 2008, Research on Language and Computation.

[53]  Maite Taboada,et al.  Genre-Based Paragraph Classification for Sentiment Analysis , 2009, SIGDIAL Conference.

[54]  Michael Stubbs,et al.  Computer‐Assisted Text and Corpus Analysis: Lexical Cohesion and Communicative Competence , 2005 .

[55]  Jeannett Martin,et al.  Genres and Registers of Discourse , 1997 .

[56]  Francis Cornish,et al.  Discourse structure and anaphora: Written and conversational english☆ , 1989 .

[57]  Helmut Gruber,et al.  Generic and Rhetorical Structures of Texts: Two Sides of the Same Coin? , 2005 .

[58]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[59]  Livio Robaldo,et al.  The Penn Discourse Treebank 2.0 Annotation Manual , 2007 .

[60]  Maite Taboada,et al.  Rhetorical and thematic patterns in scheduling dialogues: A generic characterization , 2003 .

[61]  Walter Daelemans,et al.  A Coreference Corpus and Resolution System for Dutch , 2008, LREC.

[62]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[63]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[64]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[65]  Barbara Di Eugenio,et al.  Centering: A Parametric Theory and Its Instantiations , 2004, Computational Linguistics.