Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation

The diffusion model, a new generative modeling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of the text, it is not trivial to extend continuous diffusion models to natural language. In this work, we propose SeqDiffuSeq, a text diffusion model, to approach sequence-to-sequence text generation with an encoder-decoder Transformer architecture. To improve the generation performance, SeqDiffuSeq is equipped with the self-conditioning technique and our newly proposed adaptive noise schedule technique. Self-conditioning enables SeqDiffuSeq to better use the predicted sequence information during the generation process. The adaptive noise schedule balances the difficulty of denoising across time steps at the token level. Experiment results illustrate the improved performance on five sequence-to-sequence generation tasks compared to other diffusion-based models regarding text quality and inference time. We have released our codes. 2

[1]  Xipeng Qiu,et al.  DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models , 2022, ACL.

[2]  Pierre H. Richemond,et al.  Continuous diffusion for categorical data , 2022, ArXiv.

[3]  L. Sifre,et al.  Self-conditioned Embedding Diffusion for Text Generation , 2022, ArXiv.

[4]  Lingpeng Kong,et al.  DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models , 2022, ICLR.

[5]  Geoffrey E. Hinton,et al.  Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning , 2022, ICLR.

[6]  Song-Chun Zhu,et al.  Latent Diffusion Energy-Based Model for Interpretable Text Modeling , 2022, ICML.

[7]  Xiang Lisa Li,et al.  Diffusion-LM Improves Controllable Text Generation , 2022, NeurIPS.

[8]  David J. Fleet,et al.  Video Diffusion Models , 2022, NeurIPS.

[9]  B. Ommer,et al.  High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Rianne van den Berg,et al.  Structured Denoising Diffusion Models in Discrete State-Spaces , 2021, NeurIPS.

[11]  E. Cambria,et al.  Recent advances in deep learning based dialogue systems: a systematic survey , 2021, Artificial Intelligence Review.

[12]  Prafulla Dhariwal,et al.  Improved Denoising Diffusion Probabilistic Models , 2021, ICML.

[13]  Didrik Nielsen,et al.  Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions , 2021, NeurIPS.

[14]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[15]  Jiaming Song,et al.  Denoising Diffusion Implicit Models , 2020, ICLR.

[16]  Bryan Catanzaro,et al.  DiffWave: A Versatile Diffusion Model for Audio Synthesis , 2020, ICLR.

[17]  Pieter Abbeel,et al.  Denoising Diffusion Probabilistic Models , 2020, NeurIPS.

[18]  Ivan Kobyzev,et al.  Normalizing Flows: An Introduction and Review of Current Methods , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wei Xu,et al.  Neural CRF Model for Sentence Alignment in Text Simplification , 2020, ACL.

[20]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[21]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[22]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[23]  Omer Levy,et al.  Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[24]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[25]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[26]  Xiaoyan Zhu,et al.  Commonsense Knowledge Aware Conversation Generation with Graph Attention , 2018, IJCAI.

[27]  Zhiyuan Liu,et al.  Denoising Distantly Supervised Open-Domain Question Answering , 2018, ACL.

[28]  Alexander Schwing,et al.  Fast, Diverse and Accurate Image Captioning Guided by Part-Of-Speech , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[30]  William W. Cohen,et al.  Quasar: Datasets for Question Answering by Search and Reading , 2017, ArXiv.

[31]  Philipp Koehn,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016 .

[32]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[33]  Alexandra Birch,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[34]  Aaron C. Courville,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[36]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[37]  Osmar R Zaiane,et al.  Community Topic: Topic Model Inference by Consecutive Word Community Discovery , 2022, COLING.

[38]  Diederik P. Kingma,et al.  On Density Estimation with Diffusion Models , 2021, NeurIPS.

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  Chris Callison-Burch,et al.  Open Source Toolkit for Statistical Machine Translation: Factored Translation Models and Lattice Decoding , 2006 .