Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory

We describe our experience in developing a discourse -annotated corpus for community -wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discourse -specific applications.

[1]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[2]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[3]  W. Kintsch,et al.  Strategies of discourse comprehension , 1983 .

[4]  Hulstijn THE GRAMMAR OF DISCOURSE , 2010 .

[5]  T. Givon Topic Continuity in Discourse , 1983 .

[6]  Joseph E. Grimes,et al.  The Thread of Discourse , 1984 .

[7]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[8]  L. Polanyi A formal model of the structure of discourse , 1988 .

[9]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[10]  S. Thompson,et al.  Discourse description : diverse linguistic analyses of a fund-raising text , 1992 .

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information , 1993, CL.

[13]  Eduard H. Hovy,et al.  Automated Discourse Generation Using Discourse Structure Relations , 1993, Artif. Intell..

[14]  Julia Hirschberg,et al.  Empirical Studies on the Disambiguation of Cue Phrases , 1993, Comput. Linguistics.

[15]  B. K. Britton Understanding expository text: Building mental structures to induce insights. , 1994 .

[16]  Victor Zue,et al.  Empirical evaluation of human performance and agreement in parsing discourse constituents in spoken dialogue , 1995, EUROSPEECH.

[17]  Johanna D. Moore,et al.  Investigating Cue Selection and Placement in Tutorial Discourse , 1995, ACL.

[18]  A. Cawsey Book Reviews: Participating in Explanatory Dialogues: Interpreting and Responding to Questions in Context , 1995, CL.

[19]  Beth M. Sundheim,et al.  Overview of Results of the MUC-6 Evaluation , 1995, MUC.

[20]  Gwyneth Doherty-Sneddon,et al.  The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[21]  Anthony McEnery,et al.  Further levels of annotation , 1997 .

[22]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[23]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[24]  Giacomo Ferrari Preliminary steps towards the creation of a discourse and text resource , 1998 .

[25]  Marie-Paule Péry-Woodley,et al.  Domain and genre in sublanguage text: definitional microtexts in three corpora , 1998 .

[26]  D. Marcu,et al.  Experiments in Constructing a Corpus of Discourse Trees : Problems , Annotation Choices , Issues , 1999 .

[27]  Yuji Matsumoto,et al.  Learning Discourse Relations with Active Data Selection , 1999, EMNLP.

[28]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[29]  Ellen M. Voorhees,et al.  The Eighth Text REtrieval Conference (TREC-8) , 2000 .

[30]  The Theory and Practice of Discourse Parsing and Summarization , 2000 .

[31]  Daniel Marcu,et al.  The Automatic Translation of Discourse Structures , 2000, ANLP.

[32]  Charles L. Wayne Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[33]  Benjamin K. Tsou,et al.  Enhancement of a Chinese Discourse Marker Tagger with C4.5 , 2000, ACL 2000.

[34]  Daniel Marcu,et al.  Towards Automatic Classification of Discourse Elements in Essays , 2001, ACL.