Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation

Neural Machine Translation (NMT) is notorious for its need for large amounts of bilingual data. An effective approach to compensate for this requirement is Multi-Task Learning (MTL) to leverage different linguistic resources as a source of inductive bias. Current MTL architectures are based on the Seq2Seq transduction, and (partially) share different components of the models among the tasks. However, this MTL approach often suffers from task interference and is not able to fully capture commonalities among subsets of tasks. We address this issue by extending the recurrent units with multiple “blocks” along with a trainable “routing network”. The routing network enables adaptive collaboration by dynamic sharing of blocks conditioned on the task at hand, input, and model state. Empirical evaluation of two low-resource translation tasks, English to Vietnamese and Farsi, show +1 BLEU score improvements compared to strong baselines.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[3]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[4]  Kevin Duh,et al.  DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[5]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[6]  Gholamreza Haffari,et al.  Neural Machine Translation for Bilingually Scarce Scenarios: a Deep Multi-Task Learning Approach , 2018, NAACL.

[7]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[8]  Jan Niehues,et al.  Exploiting Linguistic Resources for Neural Machine Translation Using Multi-task Learning , 2017, WMT.

[9]  Yejin Choi,et al.  Neural AMR: Sequence-to-Sequence Models for Parsing and Generation , 2017, ACL.

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[12]  Felix Hieber,et al.  Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning , 2017, EMNLP.

[13]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[14]  Joachim Bingel,et al.  Sluice networks: Learning what to share between loosely related tasks , 2017, ArXiv.

[15]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[20]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[21]  Matthew Riemer,et al.  Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[22]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[23]  Yonatan Belinkov,et al.  Understanding and Improving Morphological Learning in the Neural Machine Translation Decoder , 2017, IJCNLP.

[24]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.