A Natural Diet: Towards Improving Naturalness of Machine Translation Output

Machine translation (MT) evaluation often focuses on accuracy and fluency, without paying much attention to translation style. This means that, even when considered accurate and fluent, MT output can still sound less natural than high quality human translations or text originally written in the target language. Machine translation output notably exhibits lower lexical diversity, and employs constructs that mirror those in the source sentence. In this work we propose a method for training MT systems to achieve a more natural style, i.e. mirroring the style of text originally written in the target language. Our method tags parallel training data according to the naturalness of the target side by contrasting language models trained on natural and translated data. Tagging data allows us to put greater emphasis on target sentences originally written in the target language. Automatic metrics show that the resulting models achieve lexical richness on par with human translations, mimicking a style much closer to sentences originally written in the target language. Furthermore, we find that their output is preferred by human experts when compared to the baseline translations.

[1]  Markus Freitag,et al.  Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation , 2021, Transactions of the Association for Computational Linguistics.

[2]  Dimitar Shterionov,et al.  Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation , 2021, EACL.

[3]  Marc'Aurelio Ranzato,et al.  The Source-Target Domain Mismatch Problem in Machine Translation , 2019, EACL.

[4]  Philipp Koehn,et al.  Statistical Power and Translationese in Machine Translation Evaluation , 2020, EMNLP.

[5]  Markus Freitag,et al.  Human-Paraphrased References Improve Neural Machine Translation , 2020, WMT.

[6]  Markus Freitag,et al.  BLEU Might Be Guilty but References Are Not Innocent , 2020, EMNLP.

[7]  Antonio Toral,et al.  A Set of Recommendations for Assessing Human-Machine Parity in Language Translation , 2020, J. Artif. Intell. Res..

[8]  Markus Freitag,et al.  Translationese as a Language in “Multilingual” NMT , 2019, ACL.

[9]  Myle Ott,et al.  On The Evaluation of Machine Translation SystemsTrained With Back-Translation , 2019, ACL.

[10]  Philipp Koehn,et al.  Findings of the 2020 Conference on Machine Translation (WMT20) , 2020, WMT.

[11]  Marine Carpuat,et al.  Controlling Text Complexity in Neural Machine Translation , 2019, EMNLP.

[12]  Kyunghyun Cho,et al.  Generating Diverse Translations with Sentence Codes , 2019, ACL.

[13]  Antonio Toral,et al.  Post-editese: an Exacerbated Translationese , 2019, MTSummit.

[14]  Andy Way,et al.  Lost in Translation: Loss and Decay of Linguistic Richness in Machine Translation , 2019, MTSummit.

[15]  Antonio Toral,et al.  The Effect of Translationese in Machine Translation Test Sets , 2019, WMT.

[16]  Ciprian Chelba,et al.  Tagged Back-Translation , 2019, WMT.

[17]  Markus Freitag,et al.  APE at Scale and Its Implications on MT Evaluation Biases , 2019, WMT.

[18]  Philipp Koehn,et al.  Controlling the Reading Level of Machine Translation Output , 2019, MTSummit.

[19]  Taro Watanabe,et al.  Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection , 2018, WMT.

[20]  Andy Way,et al.  Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation , 2018, WMT.

[21]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[22]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[23]  Melvin Johnson,et al.  Gender-Aware Natural Language Translation , 2018 .

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Josep Maria Crego,et al.  Domain Control for Neural Machine Translation , 2016, RANLP.

[26]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[27]  Mamoru Komachi,et al.  Controlling the Voice of a Sentence in Japanese-to-English Neural Machine Translation , 2016, WAT@COLING.

[28]  Philipp Koehn,et al.  Ten Years of WMT Evaluation Campaigns: Lessons Learnt , 2016 .

[29]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[30]  Rico Sennrich,et al.  Controlling Politeness in Neural Machine Translation via Side Constraints , 2016, NAACL.

[31]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[32]  A. Burchardt,et al.  Multidimensional Quality Metrics (MQM): A Framework for Declaring and Describing Translation Quality Metrics , 2014 .

[33]  Shuly Wintner,et al.  Adapting Translation Models to Translationese Improves SMT , 2012, EACL.

[34]  C. J. McGrath,et al.  The Effect , 2012 .

[35]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[36]  Moshe Koppel,et al.  Translationese and Its Dialects , 2011, ACL.

[37]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[38]  Cyril Goutte,et al.  Automatic Detection of Translated Text and its Impact on Machine Translation , 2009, MTSUMMIT.

[39]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.