Discovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English-German Translation

In this paper, we analyse alignment discrepancies for discourse structures in English-German parallel data – sentence pairs, in which discourse structures in target or source texts have no alignment in the corresponding parallel sentences. The discourse-related structures are designed in form of linguistic patterns based on the information delivered by automatic part-of-speech and dependency annotation. In addition to alignment errors (existing structures left unaligned), these alignment discrepancies can be caused by language contrasts or through the phenomena of explicitation and implicitation in the translation process. We propose a new approach including new type of resources for corpus-based language contrast analysis and apply it to study and classify the contrasts found in our English-German parallel corpus. As unaligned discourse structures may also result in the loss of discourse information in the MT training data, we hope to deliver information in support of discourse-aware machine translation (MT).

[1]  Kathelijne Denturck EXPLICITATION VS. IMPLICITATION: A BIDIRECTIONAL CORPUS-BASED ANALYSIS OF CAUSAL CONNECTIVES IN FRENCH AND DUTCH TRANSLATIONS , 2012 .

[2]  Juliane House,et al.  Translation Quality Assessment: Past and Present , 2014 .

[3]  K. Kunz Variation in English and German Nominal Coreference: A Study of Political Essays , 2010 .

[4]  Tanja Kupisch,et al.  Definite article use with generic reference in German: an empirical study , 2015 .

[5]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[6]  Alice ter Meulen,et al.  Genericity: An Introduction , 1995 .

[7]  S. Zufferey,et al.  A Multifactorial Analysis of Explicitation in Translation , 2014 .

[8]  Constantin Orasan,et al.  Transferring Coreference Chains through Word Alignment , 2006, LREC.

[9]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[10]  Erich H. Steiner,et al.  Towards a comparison of cohesive reference in English and German: System and text , 2012 .

[11]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[12]  Manfred Stede,et al.  Knowledge-lean projection of coreference chains across languages , 2015, BUCC@ACL/IJCNLP.

[13]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.

[14]  K. Kunz,et al.  Cross-linguistic analysis of discourse variation across registers , 2015 .

[15]  Bonnie Webber,et al.  Implicitation of Discourse Connectives in (Machine) Translation , 2013, DiscoMT@ACL.

[16]  Michal Novák,et al.  Correspondences between Czech and English Coreferential Expressions , 2015 .

[17]  Viktor Becher,et al.  When and why do translators add connectives?: A corpus-based study , 2011 .

[18]  Viktor Becher,et al.  Explicitation and implicitation in translation , 2011 .

[19]  Joss Moorkens Sheila Castilho Federico Gaspari Stephen Doherty,et al.  Translation Quality Assessment , 2018, Machine Translation: Technologies and Applications.

[20]  M. Durrell Hammer’s German Grammar and Usage , 1991 .

[21]  Viktor Becher,et al.  Explicitation and implicitation in translation. A corpus-based study of English-German and German-English translations of business texts , 2011 .

[22]  J. House,et al.  Shifts of Cohesion and Coherence in Translation , 1996 .