Motifs in Reconstructed RST Discourse Trees

Abstract In line with the compositionality criterion and hierarchy principle of Rhetorical Structure Theory (RST), this study converts each tree in the RST Discourse Treebank into three trees with mere ultimate nodes being clauses, sentences and paragraphs, respectively. It examines the motifs of rhetorical relations along three taxonomies at the three granularity levels and also lengths of these motifs, and finds they observe the negative binomial distribution and positive negative binomial distribution respectively. The study demonstrates the applicability of RST relational analysis between same-level terminal units, which works with various granularities.

[1]  Adam Pawlowski Language in the Line vs. Language in the Mass: On the Efficiency of Sequential Modelling in the Analysis of Rhytm , 1999, J. Quant. Linguistics.

[2]  Ehud Reiter,et al.  A corpus analysis of discourse relations for Natural Language Generation , 2003 .

[3]  Reinhard Köhler,et al.  Linguistic Motifs , 2015, Sequences in Language and Text.

[4]  W. Mann,et al.  Rhetorical Structure Theory: looking back and moving ahead , 2006 .

[5]  T. Sanders Semantic and pragmatic sources of coherence: On the categorization of coherence relations in context , 1997 .

[6]  Christian Chiarcos,et al.  5. Rhetorical distance revisited: A parameterized approach , 2008 .

[7]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[8]  Reinhard Köhler,et al.  Methods and applications of quantitative linguistics : selected papers of the 8th International Conference on Quantitative Linguistics (QUALICO) in Belgrade, Serbia, April 26-29, 2012 , 2013 .

[9]  Luděk Hřebíček,et al.  Text in communication : supra-sentence structures , 1992 .

[10]  Juan-Manuel Torres-Moreno,et al.  Automatic Text Summarization: Torres-Moreno/Automatic Text Summarization , 2014 .

[11]  Tony Berber Sardinha Building Coherence and Cohesion: Task-oriented Dialogue in English and Spanish , 2006, Computational Linguistics.

[12]  Haitao Liu,et al.  Probability Distribution of Dependencies Based on a Chinese Dependency Treebank , 2009, J. Quant. Linguistics.

[13]  Haitao Liu,et al.  Probability Distribution of Discourse Relations Based on a Chinese RST-annotated Corpus , 2011, J. Quant. Linguistics.

[14]  Haitao Liu,et al.  Dependency Distance as a Metric of Language Comprehension Difficulty , 2008 .

[15]  Maki Watanabe,et al.  Discourse Tagging Reference Manual , 2001 .

[16]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[17]  Manfred Stede,et al.  The Potsdam Commentary Corpus , 2004, ACL 2004.

[18]  Johanna D. Moore,et al.  Discourse in Computational Linguistics and Artificial Intelligence , 2003 .

[19]  Gabriel Altmann,et al.  The Art of Quantitative Linguistics , 1997, J. Quant. Linguistics.

[20]  María Teresa Taboada,et al.  Building coherence and cohesion , 2004 .

[21]  V. Dijk,et al.  Macrostructures , 2019 .

[22]  Paul Kockelman The Complexity of Discourse* , 2009, J. Quant. Linguistics.

[23]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[24]  Richard Power,et al.  Deriving Rhetorical Complexity Data from the RST-DT Corpus , 2008, LREC.

[25]  Reinhard Köhler,et al.  A syntagmatic approach to automatic text classification. Statistical properties of F- and L-motifs as text characteristics , 2010, Text and Language.

[26]  Maria das Graças Volpe Nunes,et al.  On the Development and Evaluation of a Brazilian Portuguese Discourse Parser , 2008, RITA.

[27]  Maite Taboada,et al.  Applications of Rhetorical Structure Theory , 2006 .

[28]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[29]  Liang Wang,et al.  Text-level Discourse Dependency Parsing , 2014, ACL.

[30]  Daniel Marcu,et al.  Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[31]  Nabil Alami,et al.  Automatic Texts Summarization: Current State of the Art , 2015 .

[32]  Reinhard Kohler Quantitative Syntax Analysis , 2012 .

[33]  Gerardo Sierra,et al.  On the Development of the RST Spanish Treebank , 2011, Linguistic Annotation Workshop.