On the Development of the RST Spanish Treebank

In this article we present the RST Spanish Treebank, the first corpus annotated with rhetorical relations for this language. We describe the characteristics of the corpus, the annotation criteria, the annotation procedure, the inter-annotator agreement, and other related aspects. Moreover, we show the interface that we have developed to carry out searches over the corpus' annotated texts.

[1]  M. Taboada,et al.  Discourse relations reference corpus , 2008 .

[2]  Nianwen Xue,et al.  Linguistic Annotation , 2009 .

[3]  Hai Zhao,et al.  How Large a Corpus Do We Need: Statistical Method Versus Rule-based Method , 2010, LREC.

[4]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[5]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[6]  Nancy Ide,et al.  What Does Interoperability Mean , Anyway ? Toward an Operational Definition of Interoperability for Language Technology , 2010 .

[7]  Eric SanJuan,et al.  Discourse Segmentation for Spanish Based on Shallow Parsing , 2010, MICAI.

[8]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[9]  Hakluyt's Voyages,et al.  Annotation , 1936, Glasgow Medical Journal.

[10]  Mitchell P. Marcus,et al.  OntoNotes: A Unified Relational Semantic Representation , 2007, International Conference on Semantic Computing (ICSC 2007).

[11]  David Reitter,et al.  Step by step: underspecified markup in incremental rhetorical analysis , 2003, LINC@EACL.

[12]  Magdalena Romera Discourse Functional Units. The expression of coherence relations in spoken Spanish , 2004 .

[13]  Martha Palmer,et al.  To Annotate More Accurately or to Annotate More , 2010, Linguistic Annotation Workshop.

[14]  Thiago Alexandre Salgueiro Pardo,et al.  A summary planner based on a three-level discourse model , 2001, NLPRS.

[15]  Christiane Fellbaum,et al.  Historical Development and Future Directions in Data Resource Development , 2007 .

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Daniel Nicklaß,et al.  Discourse Structuring of Dynamic Content , 2006, Proces. del Leng. Natural.

[18]  Maite Taboada,et al.  A Syntactic and Lexical-Based Discourse Segmenter , 2009, ACL.

[19]  W. Mann,et al.  Rhetorical Structure Theory: looking back and moving ahead , 2006 .

[20]  Maite Taboada,et al.  Applications of Rhetorical Structure Theory , 2006 .

[21]  Iria da Cunha,et al.  Summarization of specialized discourse: the case of medical articles in spanish , 2007 .

[22]  Iria da Cunha,et al.  Comparing rhetorical structures in different languages: The influence of translation strategies , 2010 .

[23]  Michael ODonnell,et al.  RSTTool 2.4 - A markup Tool for Rhetorical Structure Theory , 2000, INLG.

[24]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[25]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[26]  María Teresa Taboada,et al.  Building coherence and cohesion , 2004 .

[27]  Tony Berber Sardinha Building Coherence and Cohesion: Task-oriented Dialogue in English and Spanish , 2006, Computational Linguistics.

[28]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[29]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[30]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .