A qualitative comparison method for rhetorical structures: identifying different discourse structures in multilingual corpora

Explaining why the same passage may have different rhetorical structures when conveyed in different languages remains an open question. Starting from a trilingual translation corpus, this paper aims to provide a new qualitative method for the comparison of rhetorical structures in different languages and to specify why translated texts may differ in their rhetorical structures. To achieve these aims we have carried out a contrastive analysis, comparing a corpus of parallel English, Spanish and Basque texts, using Rhetorical Structure Theory. We propose a method to describe the main linguistic differences among the rhetorical structures of the three languages in the two annotation stages (segmentation and rhetorical analysis). We show a new type of comparison that has important advantages with regard to the quantitative method usually employed: it provides an accurate measurement of inter-annotator agreement, and it pinpoints sources of disagreement among annotators. With the use of this new method, we show how translation strategies affect discourse structure.

[1]  Liesbeth Degand,et al.  Adversative discourse markers in contrast: the need for a combined corpus approach. , 2009 .

[2]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[3]  SardinhaTony Berber Building Coherence and Cohesion: Task-oriented Dialogue in English and Spanish , 2006 .

[4]  María Teresa Taboada,et al.  Building coherence and cohesion , 2004 .

[5]  J. Bateman,et al.  Coherence relations: Towards a general specification , 1997 .

[6]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[7]  C. Fabricius-Hansen,et al.  ' Subordination ' versus ' Coordination ' in Sentence and Text , 2015 .

[8]  Mona Baker,et al.  A corpus-based view of similarity and difference in translation , 2004 .

[9]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[10]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[11]  Manfred Stede RST revisited : disentangling nuclearity , 2008 .

[12]  Judy Delin,et al.  Identifying Congruent Pragmatic Relations in Procedural Texts , 1998 .

[13]  Sandra A. Thompson,et al.  The rhetorical structure of US-American and Dutch fund-raising letters , 1993 .

[14]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[15]  Raphael Salkie,et al.  Contrast and Concession in French and English , 1999 .

[16]  Sylviane Granger,et al.  The corpus approach: a common way forward for Contrastive Linguistics and Translation Studies , 2003 .

[17]  Nancy Ide,et al.  Veins Theory: A Model of Global Discourse Cohesion and Coherence , 1998, ACL.

[18]  Carol Lynn Moder,et al.  Discourse across languages and cultures , 2004 .

[19]  Kyo Kageura,et al.  Bilingual Terminology Mining - Using Brain, not brawn comparable corpora , 2007, ACL.

[20]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[21]  Richard Xiao,et al.  How different is translated Chinese from native Chinese , 2009 .

[22]  Iria da Cunha,et al.  Comparing rhetorical structures in different languages: The influence of translation strategies , 2010 .

[23]  L. Trask The history of Basque , 1996 .

[24]  Simon A. Fraser,et al.  Rhetorical Relations in Dialogue : A Contrastive Study , 2005 .

[25]  Gerardo Sierra,et al.  On the Development of the RST Spanish Treebank , 2011, Linguistic Annotation Workshop.

[26]  Daniel Marcu,et al.  The Automatic Translation of Discourse Structures , 2000, ANLP.

[27]  Giovanni Coray,et al.  ROSETTA: Rhetorical and semantic environment for text alignment , 2001 .

[28]  Aurelia Usoniene,et al.  Choice of strategies in realizations of epistemic possibility in English and Lithuanian: A corpus-based study , 2012 .

[29]  J. C. Catford,et al.  A linguistic theory of translation : an essay in applied linguistics , 1965 .

[30]  Issa Kanté,et al.  Mood and modality in finite noun complement clauses: a French-English contrastive study , 2012 .

[31]  Maite Taboada,et al.  Applications of Rhetorical Structure Theory , 2006 .

[32]  Guy Ramsay,et al.  Rhetorical styles and newstext: A contrastive analysis of rhetorical relations in Chinese and Australian news-journal text , 2001 .

[33]  Guy Ramsay,et al.  Linearity in rhetorical organisation: a comparative cross‐cultural analysis of newstext from the People's Republic of China and Australia , 2000 .

[34]  Andrew Chesterman,et al.  From 'Is' to 'Ought': Laws, Norms and Strategies in Translation Studies , 1993 .

[35]  Jasone Cenoz,et al.  The multilingual lexicon , 2003 .

[36]  Michael ODonnell,et al.  RSTTool 2.4 - A markup Tool for Rhetorical Structure Theory , 2000, INLG.

[37]  Xavier Gómez Guinovart,et al.  Parallel corpus-based bilingual terminology extraction , 2009, TIA.

[38]  Dekai Wu,et al.  Learning an English-Chinese Lexicon from a Parallel Corpus , 1994, AMTA.

[39]  Jasone Cenoz,et al.  The Role of Typology in the Organization of the Multilingual Lexicon , 2003 .

[40]  Mikel Lersundi,et al.  Establishing criteria for RST-based discourse segmentation and annotation for texts in Basque , 2015 .

[41]  Andrew Chesterman,et al.  Memes of Translation: The spread of ideas in translation theory. Revised edition , 1997 .

[42]  M. Taboada,et al.  Discourse relations reference corpus , 2008 .

[43]  Oier Lopez de Lacalle,et al.  The RST Basque TreeBank : an online search interface to check rhetorical relations , 2013 .

[44]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[45]  Cécile Paris,et al.  Expressing Procedural Relationships in Multilingual Instructions , 1994, INLG.

[46]  B. Chiswick,et al.  Linguistic Distance: A Quantitative Measure of the Distance Between English and Other Languages , 2004, SSRN Electronic Journal.

[47]  M. Maxwell,et al.  Limitations of corpora , 2010 .

[48]  W. Mann,et al.  Rhetorical Structure Theory: looking back and moving ahead , 2006 .

[49]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[50]  Markus Egg,et al.  How Complex is Discourse Structure? , 2010, LREC.

[51]  Dan Cristea,et al.  Comparing Discourse Tree Structures , 2013, CICLing.

[52]  Tony Berber Sardinha Building Coherence and Cohesion: Task-oriented Dialogue in English and Spanish , 2006, Computational Linguistics.

[53]  Anita Fetzer,et al.  Cognitive verbs in context , 2012 .

[54]  Judy Delin,et al.  Towards a contrastive pragmatics: Syntactic choice in English and French instructions , 1996 .

[55]  Ronnie W. Smith,et al.  Current and New Directions in Discourse and Dialogue , 2004 .

[56]  K. Kong,et al.  Are simple business request letters really simple? A comparison of Chinese and English business request letters , 1998 .

[57]  Gerardo Sierra,et al.  The RST Spanish Treebank On-line Interface , 2011, RANLP.

[58]  Anita Fetzer,et al.  Cognitive verbs in context: a contrastive analysis of English and French argumentative discourse , 2010 .

[59]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[60]  Thiago Alexandre Salgueiro Pardo,et al.  Métodos para análise discursiva automática , 2005 .

[61]  J. Flowerdew Use of signalling nouns across L1 and L2 writer corpora , 2010 .

[62]  Aysha H. Mohamed,et al.  Syntax as a Marker of Rhetorical Organization in Written Texts: Arabic and English. , 1999 .

[63]  Manfred Stede,et al.  Disambiguating Rhetorical Structure , 2008, Research on Language and Computation.