论文信息 - Deep Neural Language Models for Machine Translation

Deep Neural Language Models for Machine Translation

Neural language models (NLMs) have been able to improve machine translation (MT) thanks to their ability to generalize well to long contexts. Despite recent successes of deep neural networks in speech and vision, the general practice in MT is to incorporate NLMs with only one or two hidden layers and there have not been clear results on whether having more layers helps. In this paper, we demonstrate that deep NLMs with three or four layers outperform those with fewer layers in terms of both the perplexity and the translation quality. We combine various techniques to successfully train deep NLMs that jointly condition on both the source and target contexts. When reranking nbest lists of a strong web-forum baseline, our deep models yield an average boost of 0.5 TER / 0.5 BLEU points compared to using a shallow NLM. Additionally, we adapt our models to a new sms-chat domain and obtain a similar gain of 1.0 TER / 0.5 BLEU points. 1

Christopher D. Manning | Thang Luong | Michael Kayser | Thang Luong | Michael Kayser

[1] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Geoffrey E. Hinton,et al. Three new graphical models for statistical language modelling , 2007, ICML '07.

[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[5] Stefan Riezler,et al. On Some Pitfalls in Automatic Evaluation and Significance Testing for MT , 2005, IEEvaluation@ACL.

[6] Christopher D. Manning,et al. An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation , 2014, WMT@ACL.

[7] Alexandre Allauzen,et al. Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[8] Aapo Hyvärinen,et al. Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[9] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[10] Tara N. Sainath,et al. Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[13] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.