Nested-Wasserstein Distance for Sequence Generation

Reinforcement learning (RL) has been widely studied for improving sequencegeneration models. However, the conventional rewards used for RL training typically cannot capture sufficient semantic information and therefore render model bias. Further, the sparse and delayed rewards make RL exploration inefficient. To alleviate these issues, we propose the concept of nested-Wasserstein distance for measuring the distance between two policy distributions. Based on this, a novel nested-Wasserstein self-imitation learning framework is developed, encouraging the model to exploit historical high-rewarded sequences for deeper explorations and better semantic matching. Our solution can be understood as approximately executing proximal policy optimization with nested-Wasserstein trust-regions. Experiments on a variety of unconditional and conditional sequence-generation tasks demonstrate the proposed approach consistently leads to improved performance.

[1]  Joelle Pineau,et al.  Language GANs Falling Short , 2018, ICLR.

[2]  Zhe Gan,et al.  Improving Sequence-to-Sequence Learning via Optimal Transport , 2019, ICLR.

[3]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[4]  Eric P. Xing,et al.  Connecting the Dots Between MLE and RL for Sequence Generation , 2018, DeepRLStructPred@ICLR.

[5]  Xiaodong Gu,et al.  DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder , 2018, ICLR.

[6]  Qiang Liu,et al.  Learning Self-Imitating Diverse Policies , 2018, ICLR.

[7]  Zhe Gan,et al.  Adversarial Text Generation via Feature-Mover's Distance , 2018, NeurIPS.

[8]  Chen Liang,et al.  Memory Augmented Policy Optimization for Program Synthesis with Generalization , 2018, ArXiv.

[9]  Lawrence Carin,et al.  Policy Optimization as Wasserstein Gradient Flows , 2018, ICML.

[10]  Satinder Singh,et al.  Self-Imitation Learning , 2018, ICML.

[11]  Xuanjing Huang,et al.  Towards Diverse Text Generation with Inverse Reinforcement Learning , 2018, ArXiv.

[12]  Xin Wang,et al.  No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling , 2018, ACL.

[13]  Yulia Tsvetkov,et al.  Style Transfer Through Back-Translation , 2018, ACL.

[14]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[15]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[16]  Hongyuan Zha,et al.  A Fast Proximal Point Method for Wasserstein Distance , 2018, ArXiv.

[17]  Lei Zheng,et al.  Texygen: A Benchmarking Platform for Text Generation Models , 2018, SIGIR.

[18]  Andrew M. Dai,et al.  MaskGAN: Better Text Generation via Filling in the ______ , 2018, ICLR.

[19]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[20]  Xin Wang,et al.  Video Captioning via Hierarchical Reinforcement Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Yong Yu,et al.  Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[22]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[23]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[24]  Ramakanth Pasunuru,et al.  Reinforced Video Captioning with Entailment Rewards , 2017, EMNLP.

[25]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and VQA , 2017, ArXiv.

[26]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[27]  Zhi Chen,et al.  Adversarial Feature Matching for Text Generation , 2017, ICML.

[28]  Kevin Lin,et al.  Adversarial Ranking for Language Generation , 2017, NIPS.

[29]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[30]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[31]  Eric P. Xing,et al.  Controllable Text Generation , 2017, ArXiv.

[32]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Siqi Liu,et al.  Improved Image Captioning via Policy Gradient optimization of SPIDEr , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Zhe Gan,et al.  Semantic Compositional Networks for Visual Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[37]  Joelle Pineau,et al.  An Actor-Critic Algorithm for Sequence Prediction , 2016, ICLR.

[38]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[39]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[41]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[42]  Tao Mei,et al.  MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[44]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[45]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[46]  Ferenc Huszar,et al.  How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary? , 2015, ArXiv.

[47]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[48]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[49]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[50]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[52]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[53]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[54]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[57]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[58]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[59]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[60]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[61]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[62]  C. Villani Optimal Transport: Old and New , 2008 .

[63]  Peter Dayan,et al.  Hippocampal Contributions to Control: The Third Way , 2007, NIPS.

[64]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[65]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[66]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[67]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.