论文信息 - Deliberation Networks: Sequence Generation Beyond One-Pass Decoding

Deliberation Networks: Sequence Generation Beyond One-Pass Decoding

The encoder-decoder framework has achieved promising progress for many sequence generation tasks, including machine translation, text summarization, dialog system, image captioning, etc. Such a framework adopts an one-pass forward process while decoding and generating a sequence, but lacks the deliberation process: A generated sequence is directly used as final output without further polishing. However, deliberation is a common behavior in human's daily life like reading news and writing papers/articles/books. In this work, we introduce the deliberation process into the encoder-decoder framework and propose deliberation networks for sequence generation. A deliberation network has two levels of decoders, where the first-pass decoder generates a raw sequence and the second-pass decoder polishes and refines the raw sentence with deliberation. Since the second-pass deliberation decoder has global information about what the sequence to be generated might be, it has the potential to generate a better sequence by looking into future words in the raw sentence. Experiments on neural machine translation and text summarization demonstrate the effectiveness of the proposed deliberation networks. On the WMT 2014 English-to-French translation task, our model establishes a new state-of-the-art BLEU score of 41.5.

[1] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[2] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[3] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Li Zhao,et al. Sequence Prediction with Unlabeled Data by Reward Function Learning , 2017, IJCAI.

[5] Nenghai Yu,et al. Dual Inference for Machine Learning , 2017, IJCAI.

[6] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[7] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11] Di He,et al. Decoding with Value Networks for Neural Machine Translation , 2017, NIPS.

[12] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[13] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[14] Nenghai Yu,et al. Dual Supervised Learning , 2017, ICML.

[15] Yang Liu,et al. Minimum Risk Training for Neural Machine Translation , 2015, ACL.