AuGPT: Dialogue with Pre-trained Language Models and Data Augmentation

Attention-based pre-trained language models such as GPT-2 brought considerable progress to end-to-end dialogue modelling. However, they also present considerable risks for taskoriented dialogue, such as lack of knowledge grounding or diversity. To address these issues, we introduce modified training objectives for language model finetuning, and we employ massive data augmentation via backtranslation to increase the diversity of the training data. We further examine the possibilities of combining data from multiples sources to improve performance on the target dataset. We carefully evaluate our contributions with both human and automatic methods. Our model achieves state-of-the-art performance on the MultiWOZ data and shows competitive performance in human evaluation.

[1]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[2]  Ivan Vulić,et al.  Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems , 2019, EMNLP.

[3]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[4]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[5]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[6]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[7]  Jennifer Foster,et al.  Shape of synth to come: Why we should use synthetic data for English surface realization , 2020, ACL.

[8]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[9]  David Vandyke,et al.  Conditional Generation and Snapshot Learning in Neural Dialogue Systems , 2016, EMNLP.

[10]  Mihail Eric,et al.  MultiWOZ 2. , 2019 .

[11]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[12]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[15]  Ondvrej Bojar,et al.  ELITR Non-Native Speech Translation at IWSLT 2020 , 2020, IWSLT.

[16]  Raffaella Bernardi,et al.  Psycholinguistics Meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering , 2019, ACL.

[17]  Jianfeng Gao,et al.  Neural Approaches to Conversational AI: Question Answering, Task-oriented Dialogues and Social Chatbots , 2019 .

[18]  Jianfeng Gao,et al.  SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model , 2020, ArXiv.

[19]  Hao Wu,et al.  Mixed Precision Training , 2017, ICLR.

[20]  Yejin Choi,et al.  Neural AMR: Sequence-to-Sequence Models for Parsing and Generation , 2017, ACL.

[21]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[22]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[23]  Richard Socher,et al.  Global-to-local Memory Pointer Networks for Task-Oriented Dialogue , 2019, ICLR.

[24]  Zhijian Ou,et al.  Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , 2019, AAAI.

[25]  Richard Socher,et al.  A Simple Language Model for Task-Oriented Dialogue , 2020, NeurIPS.

[26]  Jianfeng Gao,et al.  ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems , 2020, ACL.

[27]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[28]  Christian Federmann,et al.  Multilingual Whispers: Generating Paraphrases with Translation , 2019, W-NUT@EMNLP.

[29]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[30]  Richard Socher,et al.  TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.

[31]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[32]  Nitin Madnani,et al.  Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.

[33]  Zhijian Ou,et al.  A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning , 2020, EMNLP.

[34]  Kee-Eung Kim,et al.  End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2 , 2020, ACL.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Jason Weston,et al.  Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training , 2020, ACL.

[37]  Libo Qin,et al.  Sequence-to-Sequence Learning for Task-oriented Dialogue with Dialogue State Representation , 2018, COLING.

[38]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[39]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[40]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[41]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[42]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[43]  Bill Byrne,et al.  Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.

[44]  Gökhan Tür,et al.  Flexibly-Structured Model for Task-Oriented Dialogues , 2019, SIGdial.