论文信息 - Transfer Learning across Languages from Someone Else's NMT Model

Transfer Learning across Languages from Someone Else's NMT Model

Neural machine translation is demanding in terms of training time, hardware resources, size, and quantity of parallel sentences. We propose a simple transfer learning method to recycle already trained models for different language pairs with no need for modifications in model architecture, hyper-parameters, or vocabulary. We achieve better translation quality and shorter convergence times than when training from random initialization. To show the applicability of our method, we recycle a Transformer model trained by different researchers for translating English-to-Czech and used it to seed models for seven language pairs. Our translation models are significantly better even when the re-used model's language pair is not linguistically related to the child language pair, especially for low-resource languages. Our approach needs only one pretrained model for all transferring to all various languages pairs. Additionally, we improve this approach with a simple vocabulary transformation. We analyze the behavior of transfer learning to understand the gains from unrelated languages.

Ondvrej Bojar | Tom Kocmi

[1] Huda Khayrallah,et al. Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation , 2018, WMT.

[2] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[3] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[4] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[5] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[6] Graham Neubig,et al. Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.

[7] Olivier Pietquin,et al. MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP , 2016, LREC.

[8] Timothy Baldwin,et al. langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[10] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11] André F. T. Martins,et al. Marian: Fast Neural Machine Translation in C++ , 2018, ACL.