论文信息 - Sequence Level Training with Recurrent Neural Networks

Sequence Level Training with Recurrent Neural Networks

Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We address this issue by proposing a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE. On three different tasks, our approach outperforms several strong baselines for greedy generation. The method is also competitive when these baselines employ beam search, while being several times faster.

[1] D. Rumelhart. Learning internal representations by back-propagating errors , 1986 .

[2] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[3] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[5] Ehud Reiter,et al. Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[6] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[8] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[10] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[11] Ben Taskar,et al. An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[12] Philipp Koehn,et al. Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[13] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[14] Tamir Hazan,et al. Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[15] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.