Shared-Private Bilingual Word Embeddings for Neural Machine Translation

Word embedding is central to neural machine translation (NMT), which has attracted intensive research interest in recent years. In NMT, the source embedding plays the role of the entrance while the target embedding acts as the terminal. These layers occupy most of the model parameters for representation learning. Furthermore, they indirectly interface via a soft-attention mechanism, which makes them comparatively isolated. In this paper, we propose shared-private bilingual word embeddings, which give a closer relationship between the source and target embeddings, and which also reduce the number of model parameters. For similar source and target words, their embeddings tend to share a part of the features and they cooperatively learn these common representation units. Experiments on 5 language pairs belonging to 6 different language families and written in 5 different alphabets demonstrate that the proposed model provides a significant performance boost over the strong baselines with dramatically fewer model parameters.

[1]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[2]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[3]  António Branco,et al.  Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings , 2017, ACL.

[4]  Yang Liu,et al.  THUMT: An Open-Source Toolkit for Neural Machine Translation , 2017, AMTA.

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Shuang Xu,et al.  Towards Compact and Fast Neural Machine Translation Using a Combined Method , 2017, EMNLP.

[8]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[11]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[12]  Tie-Yan Liu,et al.  LightRNN: Memory and Computation-Efficient Recurrent Neural Networks , 2016, NIPS.

[13]  Marcello Federico,et al.  Compositional Representation of Morphologically-Rich Input for Neural Machine Translation , 2018, ACL.

[14]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[18]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[19]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[20]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[21]  Graham Neubig,et al.  SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation , 2018, EMNLP.

[22]  Wenlin Chen,et al.  Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.

[23]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[24]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[25]  Shuang Wu,et al.  Slim Embedding Layers for Recurrent Neural Language Models , 2017, AAAI.

[26]  Yang Feng,et al.  Memory-augmented Neural Machine Translation , 2017, EMNLP.

[27]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[28]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[29]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[30]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[31]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[32]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[33]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[34]  Yang Liu,et al.  Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation , 2015, IJCAI.

[35]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.