Sequence Generation with Guider Network

Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only available after an entire sequence has been generated. This type of sparse reward tends to ignore the global structural information of a sequence, causing generation of sequences that are semantically inconsistent. In this paper, we present a model-based RL approach to overcome this issue. Specifically, we propose a novel guider network to model the sequence-generation environment, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments show that the proposed method leads to improved performance for both unconditional and conditional sequence-generation tasks.

[1]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[2]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[3]  Eric P. Xing,et al.  Controllable Text Generation , 2017, ArXiv.

[4]  Ning Zhang,et al.  Deep Reinforcement Learning-Based Image Captioning with Embedding Reward , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[6]  Sergey Levine,et al.  Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Yong Yu,et al.  Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[8]  Piji Li,et al.  Actor-Critic based Training Framework for Abstractive Summarization , 2018, ArXiv.

[9]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.

[10]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[11]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Pratik Rane,et al.  Self-Critical Sequence Training for Image Captioning , 2018 .

[15]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Tao Mei,et al.  Boosting Image Captioning with Attributes , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[20]  Zhe Gan,et al.  Semantic Compositional Networks for Visual Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Pieter Abbeel,et al.  Model-Ensemble Trust-Region Policy Optimization , 2018, ICLR.

[22]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[23]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[24]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Zhi Chen,et al.  Adversarial Feature Matching for Text Generation , 2017, ICML.

[27]  Christopher Joseph Pal,et al.  Twin Networks: Matching the Future for Sequence Generation , 2017, ICLR.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[30]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[31]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[34]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Yoshua Bengio,et al.  Plan, Attend, Generate: Character-Level Neural Machine Translation with Planning , 2017, Rep4NLP@ACL.

[36]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[37]  Kevin Lin,et al.  Adversarial Ranking for Language Generation , 2017, NIPS.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).