论文信息 - Training Neural Machine Translation using Word Embedding-based Loss

Training Neural Machine Translation using Word Embedding-based Loss

In neural machine translation (NMT), the computational cost at the output layer increases with the size of the target-side vocabulary. Using a limited-size vocabulary instead may cause a significant decrease in translation quality. This trade-off is derived from a softmax-based loss function that handles in-dictionary words independently, in which word similarity is not considered. In this paper, we propose a novel NMT loss function that includes word similarity in forms of distances in a word embedding space. The proposed loss function encourages an NMT decoder to generate words close to their references in the embedding space; this helps the decoder to choose similar acceptable words when the actual best candidates are not included in the vocabulary due to its size limitation. In experiments using ASPEC Japanese-to-English and IWSLT17 English-to-French data sets, the proposed method showed improvements against a standard NMT baseline in both datasets; especially with IWSLT17 En-Fr, it achieved up to +1.72 in BLEU and +1.99 in METEOR. When the target-side vocabulary was very limited to 1,000 words, the proposed method demonstrated a substantial gain, +1.72 in METEOR with ASPEC Ja-En.

[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2] Toshiaki Nakazawa,et al. ASPEC: Asian Scientific Paper Excerpt Corpus , 2016, LREC.

[3] Laurent Besacier,et al. Token-level and sequence-level loss smoothing for RNN language models , 2018, ACL.

[4] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[6] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[9] Graham Neubig,et al. Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis , 2011, ACL.

[10] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[13] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[15] Graham Neubig,et al. When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation? , 2018, NAACL.

[16] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.