Translationese as a Language in “Multilingual” NMT

Machine translation has an undesirable propensity to produce "translationese" artifacts, which can lead to higher BLEU scores while being liked less by human raters. Motivated by this, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target text? There is no data with original source and original target, so we train sentence-level classifiers to distinguish translationese from original target text, and use this classifier to tag the training data for an NMT model. Using this technique we bias the model to produce more natural outputs at test time, yielding gains in human evaluation scores on both accuracy and fluency. Additionally, we demonstrate that it is possible to bias the model to produce translationese and game the BLEU score, increasing it while decreasing human-rated quality. We analyze these models using metrics to measure the degree of translationese in the output, and present an analysis of the capriciousness of heuristically-based train-data tagging.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[3]  Peng-Jen Chen,et al.  The Source-Target Domain Mismatch Problem in Machine Translation , 2019, EACL.

[4]  Moshe Koppel,et al.  Translationese and Its Dialects , 2011, ACL.

[5]  Cyril Goutte,et al.  Automatic Detection of Translated Text and its Impact on Machine Translation , 2009, MTSUMMIT.

[6]  Josep Maria Crego,et al.  Domain Control for Neural Machine Translation , 2016, RANLP.

[7]  Philipp Koehn,et al.  Controlling the Reading Level of Machine Translation Output , 2019, MTSummit.

[8]  Antonio Toral,et al.  Post-editese: an Exacerbated Translationese , 2019, MTSummit.

[9]  Antonio Toral,et al.  The Effect of Translationese in Machine Translation Test Sets , 2019, WMT.

[10]  Huda Khayrallah,et al.  On the Impact of Various Types of Noise on Neural Machine Translation , 2018, NMT@ACL.

[11]  Andy Way,et al.  Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation , 2018, WMT.

[12]  Mamoru Komachi,et al.  Controlling the Voice of a Sentence in Japanese-to-English Neural Machine Translation , 2016, WAT@COLING.

[13]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[14]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[15]  Philipp Koehn,et al.  Translationese in Machine Translation Evaluation , 2019, EMNLP.

[16]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[17]  Nikhil Buduma,et al.  Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms , 2017 .

[18]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[19]  Marine Carpuat,et al.  Controlling Text Complexity in Neural Machine Translation , 2019, EMNLP.

[20]  Kyunghyun Cho,et al.  Generating Diverse Translations with Sentence Codes , 2019, ACL.

[21]  Myle Ott,et al.  On The Evaluation of Machine Translation SystemsTrained With Back-Translation , 2019, ACL.

[22]  Markus Freitag,et al.  Fast Domain Adaptation for Neural Machine Translation , 2016, ArXiv.

[23]  Tara N. Sainath,et al.  Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Ciprian Chelba,et al.  Tagged Back-Translation , 2019, WMT.

[26]  Gideon Toury Descriptive Translation Studies – and beyond: Revised edition , 2012 .

[27]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[28]  F. Scarpa,et al.  Corpus-based Quality Assessment of Specialist Translation: A Study Using Parallel and Comparable Corpora in English and Italian , 2006 .

[29]  Shuly Wintner,et al.  Adapting Translation Models to Translationese Improves SMT , 2012, EACL.

[30]  Markus Freitag,et al.  BLEU Might Be Guilty but References Are Not Innocent , 2020, EMNLP.

[31]  Melvin Johnson,et al.  Gender-Aware Natural Language Translation , 2018 .

[32]  Markus Freitag,et al.  APE at Scale and Its Implications on MT Evaluation Biases , 2019, WMT.

[33]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[34]  Martin Gellerstam,et al.  Translationese in Swedish novels translated from English , 1986 .