Unsupervised Machine Translation Using Monolingual Corpora Only

Machine translation has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale parallel corpora. There have been numerous attempts to extend these successes to low-resource language pairs, yet requiring tens of thousands of parallel sentences. In this work, we take this research direction to the extreme and investigate whether it is possible to learn to translate even without any parallel data. We propose a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space. By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data. We demonstrate our model on two widely used datasets and two language pairs, reporting BLEU scores of 32.8 and 15.1 on the Multi30k and WMT English-French datasets, without using even a single parallel sentence at training time.

[1]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[2]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[3]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[6]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7]  Kevin Knight,et al.  Deciphering Foreign Language , 2011, ACL.

[8]  Yang Liu,et al.  Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation , 2017, IJCAI.

[9]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[10]  Graham Neubig,et al.  Controllable Invariance through Adversarial Feature Learning , 2017, NIPS.

[11]  Jason Lee,et al.  Emergent Translation in Multi-Agent Communication , 2017, ICLR.

[12]  Chris Callison-Burch,et al.  End-to-end statistical machine translation with zero or small parallel texts , 2016, Nat. Lang. Eng..

[13]  Hideki Nakayama,et al.  Zero-resource machine translation by multimodal encoder–decoder network with multimedia pivot , 2016, Machine Translation.

[14]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[15]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[16]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[17]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[18]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[19]  Jan Kautz,et al.  Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[20]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[21]  Khalil Sima'an,et al.  Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.

[22]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[23]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[24]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[27]  Eric P. Xing,et al.  Controllable Text Generation , 2017, ArXiv.

[28]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[29]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[30]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[31]  Chris Callison-Burch,et al.  Combining Bilingual and Comparable Corpora for Low Resource Machine Translation , 2013, WMT@ACL.

[32]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[33]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[34]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[35]  Yang Liu,et al.  A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.

[36]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[37]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[38]  Alexander M. Fraser,et al.  Improved Machine Translation Performance via Parallel Sentence Extraction from Comparable Corpora , 2004, NAACL.

[39]  Kevin Knight,et al.  Deciphering Related Languages , 2017, EMNLP.

[40]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[41]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.