论文信息 - Decoding with Large-Scale Neural Language Models Improves Translation

Decoding with Large-Scale Neural Language Models Improves Translation

We explore the application of neural language models to machine translation. We develop a new model that combines the neural probabilistic language model of Bengio et al., rectified linear units, and noise-contrastive estimation, and we incorporate it into a machine translation system both by reranking k-best lists and by direct integration into the decoder. Our large-scale, large-vocabulary experiments across four language pairs show that our neural language model improves translation quality by up to 1.1 Bleu.

[1] R. Kronmal,et al. On the Alias Method for Generating Random Variables From a Discrete Distribution , 1979 .

[2] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[3] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[4] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[5] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6] H. Schwenk,et al. Efficient training of large neural networks for language modeling , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[7] Jean-Luc Gauvain,et al. Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[8] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[9] Holger Schwenk,et al. Continuous Space Language Models for Statistical Machine Translation , 2006, ACL.

[10] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[11] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[12] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[13] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[14] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[15] Alexandre Allauzen,et al. Structured Output Layer neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Lukás Burget,et al. Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[17] Alon Lavie,et al. Language Model Rest Costs and Space-Efficient Storage , 2012, EMNLP.

[18] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[19] Yee Whye Teh,et al. A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[20] Jan Niehues,et al. Continuous space language models using restricted Boltzmann machines , 2012, IWSLT.

[21] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .

[22] Holger Schwenk,et al. Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[23] Geoffrey E. Hinton,et al. On rectified linear units for speech processing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24] Holger Schwenk,et al. CSLM - a modular open-source continuous space language modeling toolkit , 2013, INTERSPEECH.