论文信息 - Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation. We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion. Furthermore, we tackle the issue of directly integrating a recurrent network into firstpass decoding under an efficient approximation. Our best results improve a phrasebased statistical machine translation system trained on WMT 2012 French-English data by up to 2.0 BLEU, and the expected BLEU objective improves over a crossentropy trained model by up to 0.6 BLEU in a single reference setup.

Jianfeng Gao | Michael Auli

[1] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[2] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[3] Jianfeng Gao,et al. Minimum Translation Modeling with Recurrent Neural Networks , 2014, EACL.

[4] Alexandre Allauzen,et al. Continuous Space Translation Models with Neural Networks , 2012, NAACL.

[5] Li Deng,et al. Maximum Expected BLEU Training of Phrase and Lexicon Translation Models , 2012, ACL.

[6] Richard M. Schwartz,et al. Expected BLEU Training for Graphs: BBN System Description for WMT11 System Combination Task , 2011, WMT@EMNLP.

[7] Tara N. Sainath,et al. Deep Neural Network Language Models , 2012, WLM@NAACL-HLT.

[8] Christopher D. Manning,et al. A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[9] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .

[10] Jianfeng Gao,et al. Training MRF-Based Phrase Translation Models using Gradient Ascent , 2013, NAACL.

[11] Geoffrey Zweig,et al. Joint Language and Translation Modeling with Recurrent Neural Networks , 2013, EMNLP.