论文信息 - Improving Lexical Choice in Neural Machine Translation

Improving Lexical Choice in Neural Machine Translation

We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.

David Chiang | Toan Q. Nguyen | David Chiang

[1] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[2] Satoshi Nakamura,et al. Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[3] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[4] Meng Yang,et al. Large-Margin Softmax Loss for Convolutional Neural Networks , 2016, ICML.

[5] Hermann Ney,et al. Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[6] Bowen Zhou,et al. Pointing the Unknown Words , 2016, ACL.

[7] Christopher D. Manning,et al. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[8] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[10] Christopher D. Manning,et al. Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).