论文信息 - Improving Multilingual Translation by Representation and Gradient Regularization - 字舞流文

Improving Multilingual Translation by Representation and Gradient Regularization

Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low quality translations – commonly failing to even produce outputs in the right target language. In this work, we observe that off-target translation is dominant even in strong multilingual systems, trained on massive multilingual corpora. To address this issue, we propose a joint approach to regularize NMT models at both representation-level and gradient-level. At the representation level, we leverage an auxiliary target language prediction task to regularize decoder outputs to retain information about the target language. At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance by +5.59 and +10.38 BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our method also works well when the small amount of direct data is not available.1

Alexandre Muzio | Prasad Tadepalli | Akiko Eriguchi | Stefan Lee | Hany Hassan | Yilin Yang | Prasad Tadepalli | Hany Hassan | Akiko Eriguchi | Alexandre Muzio | Stefan Lee | Yilin Yang | P. Tadepalli

[1] Rico Sennrich,et al. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[2] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[3] Yuqing Tang,et al. Multilingual Translation with Extensible Multilingual Pretraining and Finetuning , 2020, ArXiv.

[4] Shuming Shi,et al. On the Sub-Layer Functionalities of Transformer Decoder , 2020, FINDINGS.

[5] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[6] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .

[7] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[8] Eliyahu Kiperwasser,et al. Scheduled Multi-Task Learning: From Syntax to Translation , 2018, TACL.

[9] Pushpak Bhattacharyya,et al. Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders , 2019, ACL.

[10] Graham Neubig,et al. Parameter Sharing Methods for Multilingual Self-Attentional Translation Models , 2018, WMT.

[11] Miguel Ballesteros,et al. Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[12] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[13] Jan Niehues,et al. Improving Zero-Shot Translation by Disentangling Positional Information , 2021, ACL/IJCNLP.

[14] Tom M. Mitchell,et al. Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16] Razvan Pascanu,et al. Ray Interference: a Source of Plateaus in Deep Reinforcement Learning , 2019, ArXiv.

[17] Yong Wang,et al. Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations , 2019, ACL.

[18] Mingbo Ma,et al. Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation , 2018, EMNLP.

[19] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.

[20] ChengXiang Zhai,et al. Multi-task Learning for Multilingual Neural Machine Translation , 2020, EMNLP.

[21] Feifei Zhai,et al. Three Strategies to Improve One-to-Many Multilingual Translation , 2018, EMNLP.

[22] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[24] Ankur Bapna,et al. Gradient-guided Loss Masking for Neural Machine Translation , 2021, ArXiv.

[25] Jan Niehues,et al. Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[26] Yulia Tsvetkov,et al. Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models , 2020, ICLR.

[27] Jan Niehues,et al. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[29] Yoshimasa Tsuruoka,et al. Learning to Parse and Translate Improves Neural Machine Translation , 2017, ACL.