论文信息 - Sequence Generation with Guider Network

Sequence Generation with Guider Network

Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only available after an entire sequence has been generated. This type of sparse reward tends to ignore the global structural information of a sequence, causing generation of sequences that are semantically inconsistent. In this paper, we present a model-based RL approach to overcome this issue. Specifically, we propose a novel guider network to model the sequence-generation environment, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments show that the proposed method leads to improved performance for both unconditional and conditional sequence-generation tasks.

[1] Alex M. Andrew,et al. Reinforcement Learning: : An Introduction , 1998 .

[2] Andrew M. Dai,et al. MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[3] Eric P. Xing,et al. Controllable Text Generation , 2017, ArXiv.

[4] Ning Zhang,et al. Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6] Sergey Levine,et al. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7] Yong Yu,et al. Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[8] Piji Li,et al. Actor-Critic based Training Framework for Abstractive Summarization , 2018, ArXiv.

[9] Jürgen Schmidhuber,et al. Recurrent policy gradients , 2010, Log. J. IGPL.

[10] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[11] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[14] Pratik Rane,et al. Self-Critical Sequence Training for Image Captioning , 2018 .

[15] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Tao Mei,et al. Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Joelle Pineau,et al. An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[20] Zhe Gan,et al. Semantic Compositional Networks for Visual Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Pieter Abbeel,et al. Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[22] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[23] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[24] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[25] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26] Zhi Chen,et al. Adversarial Feature Matching for Text Generation , 2017, ICML.

[27] Christopher Joseph Pal,et al. Twin Networks: Matching the Future for Sequence Generation , 2017, ICLR.

[28] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[30] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[31] Regina Barzilay,et al. Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[32] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33] Lei Zheng,et al. Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[34] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35] Yoshua Bengio,et al. Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning , 2017, Rep4NLP@ACL.

[36] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[37] Kevin Lin,et al. Adversarial Ranking for Language Generation , 2017, NIPS.

[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).