论文信息 - On Using Very Large Target Vocabulary for Neural Machine Translation - 字舞流文

On Using Very Large Target Vocabulary for Neural Machine Translation

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method based on importance sampling that allows us to use a very large target vocabulary without increasing training complexity. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English!German translation and almost as high performance as state-of-the-art English!French translation system.

Yoshua Bengio | Kyunghyun Cho | Roland Memisevic | Sébastien Jean | Yoshua Bengio | Kyunghyun Cho | Sébastien Jean | R. Memisevic

[1] Mikel L. Forcada,et al. Recursive Hetero-associative Memories for Translation , 1997, IWANN.

[2] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[4] Yoshua Bengio,et al. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.

[5] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[6] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[7] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[8] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[9] Noah A. Smith,et al. A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[10] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[11] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[12] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[13] Markus Freitag,et al. The RWTH Aachen German-English Machine Translation System for WMT 2014 , 2014 .

[14] Andy Way,et al. The DCU-ICTCAS MT system at WMT 2014 on German-English Translation Task , 2014, WMT@ACL.

[15] Hermann Ney,et al. The RWTH Aachen German-English Machine Translation System for WMT 2015 , 2015, WMT@EMNLP.

[16] Nadir Durrani,et al. EU-BRIDGE MT: Combined Machine Translation , 2014, WMT@ACL.

[17] Kenneth Heafield,et al. N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[18] Nadir Durrani,et al. Edinburgh’s Phrase-based Machine Translation Systems for WMT-14 , 2014, WMT@ACL.

[19] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[21] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[22] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.