What Can Unsupervised Machine Translation Contribute to High-Resource Language Pairs?

Whereas existing literature on unsupervised machine translation (MT) focuses on exploiting unsupervised techniques for low-resource language pairs where bilingual training data is scare or unavailable, we investigate whether unsupervised MT can also improve translation quality of high-resource language pairs where sufficient bitext does exist. We compare the style of correct translations generated by either supervised or unsupervised MT and find that the unsupervised output is less monotonic and more natural than supervised output. We demonstrate a way to combine the benefits of unsupervised and supervised MT into a single system, resulting in better human evaluation of quality and fluency. Our results open the door to discussions about the potential contributions of unsupervised MT in high-resource settings, and how supervised and unsupervised systems might be mutually-beneficial.

[1]  Markus Freitag,et al.  Translationese as a Language in “Multilingual” NMT , 2020, ACL.

[2]  周彬彬,et al.  Interlanguage : forty years later , 2014 .

[3]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[4]  Antonio Toral,et al.  The Effect of Translationese in Machine Translation Test Sets , 2019, WMT.

[5]  Cyril Goutte,et al.  Automatic Detection of Translated Text and its Impact on Machine Translation , 2009, MTSUMMIT.

[6]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[7]  Markus Freitag,et al.  BLEU Might Be Guilty but References Are Not Innocent , 2020, EMNLP.

[8]  Antonio Toral,et al.  Reassessing Claims of Human Parity and Super-Human Performance in Machine Translation at WMT 2019 , 2020, EAMT.

[9]  Ciprian Chelba,et al.  Tagged Back-Translation , 2019, WMT.

[10]  Antonio Toral,et al.  Post-editese: an Exacerbated Translationese , 2019, MTSummit.

[11]  Alexandra Birch,et al.  Reordering metrics for statistical machine translation , 2011 .

[12]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[13]  Rico Sennrich,et al.  Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation , 2019, ArXiv.

[14]  Alexandra Birch,et al.  A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[15]  Tara N. Sainath,et al.  Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.

[16]  Alexandra Birch,et al.  Reordering Metrics for MT , 2011, ACL.

[17]  Mona Baker,et al.  'Corpus Linguistics and Translation Studies: Implications and Applications' , 1993 .

[18]  Andy Way,et al.  Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation , 2019, MTSummit.

[19]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[20]  Benjamin Marie,et al.  Tagged Back-translation Revisited: Why Does It Really Work? , 2020, ACL.

[21]  Philipp Koehn,et al.  Predicting Success in Machine Translation , 2008, EMNLP.

[22]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[23]  Moshe Koppel,et al.  Translationese and Its Dialects , 2011, ACL.

[24]  Eneko Agirre,et al.  An Effective Approach to Unsupervised Machine Translation , 2019, ACL.

[25]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[26]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[27]  Lucia Specia,et al.  Reference Bias in Monolingual Machine Translation Evaluation , 2016, ACL.

[28]  Gideon Toury,et al.  Descriptive translation studies and beyond , 1995 .

[29]  Andy Way,et al.  Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation , 2018, WMT.

[30]  Ankur Bapna,et al.  Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation , 2020, ACL.

[31]  Wei Chen,et al.  Effectively training neural machine translation models with monolingual data , 2019, Neurocomputing.

[32]  I. Dan Melamed,et al.  Empirical Lower Bounds on the Complexity of Translational Equivalence , 2006, ACL.

[33]  Markus Freitag,et al.  APE at Scale and Its Implications on MT Evaluation Biases , 2019, WMT.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[36]  Shuly Wintner,et al.  Adapting Translation Models to Translationese Improves SMT , 2012, EACL.

[37]  Philipp Koehn,et al.  Translationese in Machine Translation Evaluation , 2019, EMNLP.

[38]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[39]  Timothy Baldwin,et al.  Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation , 2017, EMNLP.

[40]  Arianna Bisazza,et al.  Surveys: A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena , 2015, CL.

[41]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[42]  Myle Ott,et al.  On The Evaluation of Machine Translation SystemsTrained With Back-Translation , 2019, ACL.

[43]  Alexandra Birch,et al.  Metrics for MT evaluation: evaluating reordering , 2010, Machine Translation.

[44]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[45]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[46]  Dimitar Shterionov,et al.  Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation , 2021, EACL.

[47]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[48]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[49]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[50]  Pekka Kujamäki,et al.  Translation universals: do they exist? , 2004 .

[51]  Heidi Fox,et al.  Phrasal Cohesion and Statistical Machine Translation , 2002, EMNLP.

[52]  Markus Freitag,et al.  Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation , 2021, Transactions of the Association for Computational Linguistics.

[53]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[54]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .