Efficient Unsupervised NMT for Related Languages with Cross-Lingual Language Models and Fidelity Objectives

The most successful approach to Neural Machine Translation (NMT) when only monolingual training data is available, called unsupervised machine translation, is based on back-translation where noisy translations are generated to turn the task into a supervised one. However, back-translation is computationally very expensive and inefficient. This work explores a novel, efficient approach to unsupervised NMT. A transformer, initialized with cross-lingual language model weights, is fine-tuned exclusively on monolingual data of the target language by jointly learning on a paraphrasing and denoising autoencoder objective. Experiments are conducted on WMT datasets for German-English, French-English, and Romanian-English. Results are competitive to strong baseline unsupervised NMT models, especially for closely related source languages (German) compared to more distant ones (Romanian, French), while requiring about a magnitude less training time.

[1]  Sam Witteveen,et al.  Paraphrasing with Large Language Models , 2019, EMNLP.

[2]  Tapio Salakoski,et al.  Multilingual is not enough: BERT for Finnish , 2019, ArXiv.

[3]  Richard Socher,et al.  Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression , 2019, ArXiv.

[4]  Eneko Agirre,et al.  An Effective Approach to Unsupervised Machine Translation , 2019, ACL.

[5]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[6]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[7]  Kevin Gimpel,et al.  Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext , 2017, EMNLP.

[8]  Aurko Roy,et al.  Unsupervised Paraphrasing without Translation , 2019, ACL.

[9]  Karin M. Verspoor,et al.  Findings of the 2016 Conference on Machine Translation , 2016, WMT.

[10]  Chris Callison-Burch,et al.  The Multilingual Paraphrase Database , 2014, LREC.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[13]  Hang Li,et al.  Paraphrase Generation with Deep Reinforcement Learning , 2017, EMNLP.

[14]  Yan Xu,et al.  Incorporating Word and Subword Units in Unsupervised Machine Translation Using Language Model Rescoring , 2019, WMT.

[15]  Kazuhide Yamamoto,et al.  Extremely Low Resource Text simplification with Pre-trained Transformer Language Model , 2019, 2019 International Conference on Asian Language Processing (IALP).

[16]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[17]  Eduard H. Hovy,et al.  Squibs: What Is a Paraphrase? , 2013, CL.

[18]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[19]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[20]  Wenjie Li,et al.  Joint Copying and Restricted Generation for Paraphrase , 2016, AAAI.

[21]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[22]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[23]  Ondrej Bojar,et al.  CUNI Systems for the Unsupervised News Translation Task in WMT 2019 , 2019, WMT.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[26]  Philipp Koehn,et al.  Findings of the 2020 Conference on Machine Translation (WMT20) , 2020, WMT.

[27]  Philipp Koehn,et al.  Design of the Moses Decoder for Statistical Machine Translation , 2008, SETQALNLP.

[28]  Hermann Ney,et al.  When and Why is Unsupervised Neural Machine Translation Useless? , 2020, EAMT.

[29]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[30]  Christian Federmann,et al.  Multilingual Whispers: Generating Paraphrases with Translation , 2019, W-NUT@EMNLP.

[31]  Chris Callison-Burch,et al.  Proceedings of the Second Workshop on Statistical Machine Translation , 2007 .

[32]  Mirella Lapata,et al.  Sentence Simplification with Deep Reinforcement Learning , 2017, EMNLP.

[33]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[34]  Mirella Lapata,et al.  Paraphrasing Revisited with Neural Machine Translation , 2017, EACL.