Transfer Learning across Languages from Someone Else's NMT Model

Neural machine translation is demanding in terms of training time, hardware resources, size, and quantity of parallel sentences. We propose a simple transfer learning method to recycle already trained models for different language pairs with no need for modifications in model architecture, hyper-parameters, or vocabulary. We achieve better translation quality and shorter convergence times than when training from random initialization. To show the applicability of our method, we recycle a Transformer model trained by different researchers for translating English-to-Czech and used it to seed models for seven language pairs. Our translation models are significantly better even when the re-used model's language pair is not linguistically related to the child language pair, especially for low-resource languages. Our approach needs only one pretrained model for all transferring to all various languages pairs. Additionally, we improve this approach with a simple vocabulary transformation. We analyze the behavior of transfer learning to understand the gains from unrelated languages.

[1]  Huda Khayrallah,et al.  Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation , 2018, WMT.

[2]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[3]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[4]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[5]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[6]  Graham Neubig,et al.  Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.

[7]  Olivier Pietquin,et al.  MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP , 2016, LREC.

[8]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[12]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[13]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[14]  Marcello Federico,et al.  Transfer Learning in Multilingual Neural Machine Translation with Dynamic Vocabulary , 2018, IWSLT.

[15]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[16]  Ondrej Bojar,et al.  Trivial Transfer Learning for Low-Resource Neural Machine Translation , 2018, WMT.

[17]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[18]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[19]  Chao Yang,et al.  A Survey on Deep Transfer Learning , 2018, ICANN.

[20]  Ondřej Bojar,et al.  OdiEnCorp: Odia–English and Odia-Only Corpus for Machine Translation , 2019, Smart Intelligent Computing and Applications.

[21]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[22]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[23]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[24]  Martin Popel,et al.  CUNI Transformer Neural MT System for WMT18 , 2018, WMT.

[25]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[26]  Josef van Genabith,et al.  Neural machine translation for low-resource languages without parallel corpora , 2017, Machine Translation.

[27]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.