Unsupervised Neural Machine Translation Initialized by Unsupervised Statistical Machine Translation

Recent work achieved remarkable results in training neural machine translation (NMT) systems in a fully unsupervised way, with new and dedicated architectures that rely on monolingual corpora only. In this work, we propose to define unsupervised NMT (UNMT) as NMT trained with the supervision of synthetic bilingual data. Our approach straightforwardly enables the use of state-of-the-art architectures proposed for supervised NMT by replacing human-made bilingual data with synthetic bilingual data for training. We propose to initialize the training of UNMT with synthetic bilingual data generated by unsupervised statistical machine translation (USMT). The UNMT system is then incrementally improved using back-translation. Our preliminary experiments show that our approach achieves a new state-of-the-art for unsupervised machine translation on the WMT16 German--English news translation task, for both translation directions.

[1]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[2]  Wei Chen,et al.  Unsupervised Neural Machine Translation with Weight Sharing , 2018 .

[3]  Peng Xu,et al.  A Systematic Comparison of Phrase Table Pruning Techniques , 2012, EMNLP.

[4]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[5]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[6]  Mamoru Komachi,et al.  Building a Monolingual Parallel Corpus for Text Simplification Using Sentence Similarity Based on Alignment between Word Embeddings , 2016, COLING.

[7]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Gholamreza Haffari,et al.  Iterative Back-Translation for Neural Machine Translation , 2018, NMT@ACL.

[10]  Yang Liu,et al.  Joint training for pivot-based neural machine translation , 2017, IJCAI 2017.

[11]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[12]  Joel D. Martin,et al.  Improving Translation Quality by Discarding Most of the Phrasetable , 2007, EMNLP.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[15]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[16]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[17]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[18]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[19]  Dan Roth,et al.  Unsupervised Sparse Vector Densification for Short Text Similarity , 2015, NAACL.

[20]  Atsushi Fujita,et al.  Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[21]  Kai Zhao,et al.  Learning Translation Models from Monolingual Continuous Representations , 2015, NAACL.

[22]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[23]  David Yarowsky,et al.  Toward Statistical Machine Translation without Parallel Corpora , 2012, EACL 2012.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[26]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[29]  Gholamreza Haffari,et al.  Transductive learning for statistical machine translation , 2007, ACL.

[30]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.