Dior-CVAE: Diffusion Priors in Variational Dialog Generation

Conditional variational autoencoders (CVAEs) have been used recently for diverse response generation, by introducing latent variables to represent the relationship between a dialog context and its potential responses. However, the diversity of the generated responses brought by a CVAE model is limited due to the oversimplified assumption of the isotropic Gaussian prior. We propose, Dior-CVAE, a hierarchical CVAE model with an informative prior produced by a diffusion model. Dior-CVAE derives a series of layer-wise latent variables using attention mechanism and infusing them into decoder layers accordingly. We propose memory dropout in the latent infusion to alleviate posterior collapse. The prior distribution of the latent variables is parameterized by a diffusion model to introduce a multimodal distribution. Overall, experiments on two popular open-domain dialog datasets indicate the advantages of our approach over previous Transformer-based variational dialog models in dialog response generation. We publicly release the code for reproducing Dior-CVAE and all baselines at https://github.com/SkyFishMoon/Latent-Diffusion-Response-Generation.

[1]  Kilian Q. Weinberger,et al.  Latent Diffusion for Language Generation , 2022, NeurIPS.

[2]  L. Sifre,et al.  Self-conditioned Embedding Diffusion for Text Generation , 2022, ArXiv.

[3]  Lingpeng Kong,et al.  DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models , 2022, ICLR.

[4]  Seungju Han,et al.  Measuring and Improving Semantic Diversity of Dialogue Generation , 2022, EMNLP.

[5]  Max B. Paulus,et al.  Learning to Drop Out: An Adversarial Approach to Training Sequence VAEs , 2022, NeurIPS.

[6]  Xiaodong He,et al.  Composable Text Controls in Latent Space with ODEs , 2022, EMNLP.

[7]  Jonathan Ho Classifier-Free Diffusion Guidance , 2022, ArXiv.

[8]  Maosong Sun,et al.  Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation , 2022, NAACL.

[9]  Zefeng Cai,et al.  PCVAE: Generating Prior Context for Dialogue Response Generation , 2022, IJCAI.

[10]  Wayne Xin Zhao,et al.  MVP: Multi-task Supervised Pre-training for Natural Language Generation , 2022, ACL.

[11]  Song-Chun Zhu,et al.  Latent Diffusion Energy-Based Model for Interpretable Text Modeling , 2022, ICML.

[12]  Xiang Lisa Li,et al.  Diffusion-LM Improves Controllable Text Generation , 2022, NeurIPS.

[13]  Zhongliang Yang,et al.  AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling , 2022, ArXiv.

[14]  Yi Mao,et al.  DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation , 2022, Annual Meeting of the Association for Computational Linguistics.

[15]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[16]  Alexey A. Gritsenko,et al.  Autoregressive Diffusion Models , 2021, ICLR.

[17]  Seongmin Park,et al.  Finetuning Pretrained Transformers into Variational Autoencoders , 2021, INSIGHTS.

[18]  Rianne van den Berg,et al.  Structured Denoising Diffusion Models in Discrete State-Spaces , 2021, NeurIPS.

[19]  Diederik P. Kingma,et al.  Variational Diffusion Models , 2021, ArXiv.

[20]  Jan Kautz,et al.  Score-based Generative Modeling in Latent Space , 2021, NeurIPS.

[21]  Jen-Tzung Chien,et al.  Variational Dialogue Generation with Normalizing Flows , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Prafulla Dhariwal,et al.  Diffusion Models Beat GANs on Image Synthesis , 2021, NeurIPS.

[23]  Curtis Hawthorne,et al.  Symbolic Music Generation with Diffusion Models , 2021, ISMIR.

[24]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[25]  Didrik Nielsen,et al.  Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions , 2021, NeurIPS.

[26]  Mitesh M. Khapra,et al.  Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining , 2020, Transactions of the Association for Computational Linguistics.

[27]  Bryan Catanzaro,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[28]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[29]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[30]  Xiujun Li,et al.  Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space , 2020, EMNLP.

[31]  Ming Zhou,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.

[32]  Lili Mou,et al.  Adversarial Learning on the Latent Space for Diverse Dialog Generation , 2019, COLING.

[33]  Guodong Zhou,et al.  A Discrete CVAE for Response Generation on Short-Text Conversation , 2019, EMNLP.

[34]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[35]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[36]  Hua Wu,et al.  PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable , 2019, ACL.

[37]  Yiming Yang,et al.  A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text , 2019, EMNLP.

[38]  Jianfeng Gao,et al.  Implicit Deep Latent Variable Models for Text Generation , 2019, EMNLP.

[39]  Sergey I. Nikolenko,et al.  Large-Scale Transfer Learning for Natural Language Generation , 2019, ACL.

[40]  Hua Wu,et al.  Generating Multiple Diverse Responses with Multi-Mapping and Posterior Mapping Selection , 2019, IJCAI.

[41]  Richard Csaky,et al.  Improving Neural Conversational Models with Entropy-Based Data Filtering , 2019, ACL.

[42]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[43]  Xiaodong Liu,et al.  Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing , 2019, NAACL.

[44]  Xiaodong Gu,et al.  DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder , 2018, ICLR.

[45]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[46]  Hui Su,et al.  Towards Better Variational Encoder-Decoders in Seq2Seq Tasks , 2018, AAAI.

[47]  Zhaochun Ren,et al.  Hierarchical Variational Memory Network for Dialogue Generation , 2018, WWW.

[48]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[49]  Xiaoyu Shen,et al.  Improving Variational Encoder-Decoders in Dialogue Generation , 2018, AAAI.

[50]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[51]  Pascal Poupart,et al.  Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[52]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[53]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[54]  Yang Zhao,et al.  A Conditional Variational Framework for Dialog Generation , 2017, ACL.

[55]  Maxine Eskénazi,et al.  Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[56]  Erhardt Barth,et al.  A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[57]  Joelle Pineau,et al.  Piecewise Latent Variables for Neural Variational Text Processing , 2016, EMNLP.

[58]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[59]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[60]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[61]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[62]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[63]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[64]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[65]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[66]  Surya Ganguli,et al.  Deep Unsupervised Learning using Nonequilibrium Thermodynamics , 2015, ICML.

[67]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[68]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[69]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..