论文信息 - StyleDGPT: Stylized Response Generation with Pre-trained Language Models - 字舞流文

StyleDGPT: Stylized Response Generation with Pre-trained Language Models

Generating responses following a desired style has great potentials to extend applications of open-domain dialogue systems, yet is refrained by lacking of parallel data for training. In this work, we explore the challenging task with pre-trained language models that have brought breakthrough to various natural language tasks. To this end, we introduce a KL loss and a style classifier to the fine-tuning step in order to steer response generation towards the target style in both a word-level and a sentence-level. Comprehensive empirical studies with two public datasets indicate that our model can significantly outperform state-of-the-art methods in terms of both style consistency and contextual coherence.

Wei Wu | Can Xu | Zhoujun Li | Ze Yang | Jiaqi Bai | Wei Wang | Liran Wang | Xinnian Liang | Zhoujun Li | Wei Wu | Can Xu | Xinnian Liang | Liran Wang | Ze Yang | Jiaqi Bai | Wei Wang

[1] Quoc V. Le,et al. A Neural Conversational Model , 2015, ArXiv.

[2] Jianfeng Gao,et al. A Persona-Based Neural Conversation Model , 2016, ACL.

[3] Matthew Henderson,et al. ConveRT: Efficient and Accurate Conversational Representations from Transformers , 2020, EMNLP.

[4] Alan Ritter,et al. Data-Driven Response Generation in Social Media , 2011, EMNLP.

[5] Yejin Choi,et al. Characterizing Stylistic Elements in Syntactic Structure , 2012, EMNLP.

[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[7] Percy Liang,et al. Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[8] Maxine Eskénazi,et al. Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders , 2017, ACL.

[9] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[10] Mohit Bansal,et al. Polite Dialogue Generation Without Parallel Data , 2018, TACL.

[11] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[12] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14] Xuanjing Huang,et al. Style Transformer: Unpaired Text Style Transfer without Disentangled Latent Representation , 2019, ACL.

[15] Jason Yosinski,et al. Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[16] Jason Weston,et al. Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[17] Minlie Huang,et al. Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders , 2018, ACL.

[18] Kentaro Inui,et al. Generating Stylistically Consistent Dialog Responses with Transfer Learning , 2017, IJCNLP.

[19] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[20] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[21] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[22] Dongyan Zhao,et al. Low-Resource Knowledge-Grounded Dialogue Generation , 2020, ICLR.

[23] Jianfeng Gao,et al. Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models , 2017, IJCNLP.

[24] Wei Wu,et al. Zero-Resource Knowledge-Grounded Dialogue Generation , 2020, NeurIPS.

[25] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[26] Xueqi Cheng,et al. ReCoSa: Detecting the Relevant Contexts with Self-Attention for Multi-turn Dialogue Generation , 2019, ACL.

[27] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[28] Wei-Ying Ma,et al. Topic Aware Neural Response Generation , 2016, AAAI.

[29] Ben Poole,et al. Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31] Minlie Huang,et al. A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data , 2019, AAAI.

[32] Jianfeng Gao,et al. A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[33] Ying Wang,et al. Neural Response Generation with Meta-words , 2019, ACL.

[34] Balaji Vasan Srinivasan,et al. A Lexical, Syntactic, and Semantic Perspective for Understanding Style in Text , 2019, ArXiv.

[35] Xueqi Cheng,et al. Learning to Control the Specificity in Neural Response Generation , 2018, ACL.

[36] Harry Shum,et al. From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.

[37] Hang Li,et al. Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[38] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[39] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[40] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[41] Jacob Cohen,et al. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[42] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[43] Jianfeng Gao,et al. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[44] Flemming Topsøe,et al. Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[45] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[46] Dongyan Zhao,et al. Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[47] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[48] Guillaume Lample,et al. Multiple-Attribute Text Rewriting , 2018, ICLR.

[49] Sungjin Lee,et al. Structuring Latent Spaces for Stylized Response Generation , 2019, EMNLP.

[50] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[51] Wei-Ying Ma,et al. Hierarchical Recurrent Attention Network for Response Generation , 2017, AAAI.

[52] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[53] Thomas Wolf,et al. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[54] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[55] Tiancheng Zhao,et al. Pretraining Methods for Dialog Context Representation Learning , 2019, ACL.

[56] Lili Mou,et al. Disentangled Representation Learning for Non-Parallel Text Style Transfer , 2018, ACL.

[57] Xiaoyan Zhu,et al. Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory , 2017, AAAI.

[58] Yi Pan,et al. Conversational AI: The Science Behind the Alexa Prize , 2018, ArXiv.

[59] Jason Weston,et al. What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.