Pretraining the Noisy Channel Model for Task-Oriented Dialogue

Abstract Direct decoding for task-oriented dialogue is known to suffer from the explaining-away effect, manifested in models that prefer short and generic responses. Here we argue for the use of Bayes’ theorem to factorize the dialogue task into two models, the distribution of the context given the response, and the prior for the response itself. This approach, an instantiation of the noisy channel model, both mitigates the explaining-away effect and allows the principled incorporation of large pretrained models for the response prior. We present extensive experiments showing that a noisy channel model decodes better responses compared to direct decoding and that a two-stage pretraining strategy, employing both open-domain and task-oriented dialogue data, improves over randomly initialized models.

[1]  Bill Byrne,et al.  Taskmaster-1: Toward a Realistic and Diverse Dialog Dataset , 2019, EMNLP.

[2]  R. Socher,et al.  A Simple Language Model for Task-Oriented Dialogue , 2020, Neural Information Processing Systems.

[3]  Jianfeng Gao,et al.  Few-shot Natural Language Generation for Task-Oriented Dialog , 2020, FINDINGS.

[4]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[5]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[6]  Maxine Eskénazi,et al.  Structured Fusion Networks for Dialog , 2019, SIGdial.

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[9]  Yuki Arase,et al.  Dialogue-Act Prediction of Future Responses Based on Conversation History , 2019, ACL.

[10]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[11]  Zhijian Ou,et al.  Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , 2019, AAAI.

[12]  蒋家义 How to Do Things with Words之脉络分析 , 2009 .

[13]  Bill Dolan,et al.  A Controllable Model of Grounded Response Generation , 2020, AAAI.

[14]  Ehsan Hosseini-Asl,et al.  Toward Scalable Neural Dialogue State Tracking Model , 2018, ArXiv.

[15]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[16]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[17]  Carel van Niekerk,et al.  TripPy: A Triple Copy Strategy for Value Independent Neural Dialog State Tracking , 2020, SIGDIAL.

[18]  David Vandyke,et al.  A Generative Model for Joint Natural Language Understanding and Generation , 2020, ACL.

[19]  Wenhu Chen,et al.  Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention , 2019, ACL.

[20]  Maxine Eskénazi,et al.  Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models , 2019, NAACL.

[21]  Jiahuan Pei,et al.  A Modular Task-oriented Dialogue System Using a Neural Mixture-of-Experts , 2019, ArXiv.

[22]  Richard Socher,et al.  TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.

[23]  Bing Liu,et al.  End-to-End Learning of Task-Oriented Dialogs , 2018, NAACL.

[24]  M. de Rijke,et al.  Diversifying Task-oriented Dialogue Response Generation with Prototype Guided Paraphrasing. , 2020 .

[25]  Li Zhou,et al.  Multi-domain Dialogue State Tracking as Dynamic Knowledge Graph Enhanced Question Answering , 2019, ArXiv.

[26]  Xiaojun Quan,et al.  UBAR: Towards Fully End-to-End Task-Oriented Dialog Systems with GPT-2 , 2020, AAAI.

[27]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[28]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[29]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[30]  Hua Wu,et al.  PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable , 2019, ACL.

[31]  Toan Q. Nguyen,et al.  Transformers without Tears: Improving the Normalization of Self-Attention , 2019, IWSLT.

[32]  Jianfeng Gao,et al.  DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, Annual Meeting of the Association for Computational Linguistics.

[33]  Tsung-Hsien Wen,et al.  Latent Intention Dialogue Models , 2017, ICML.

[34]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[35]  Feng Ji,et al.  Teacher-Student Framework Enhanced Multi-domain Dialogue Generation , 2019, ArXiv.

[36]  Maxine Eskénazi,et al.  Let's go public! taking a spoken dialog system to the real world , 2005, INTERSPEECH.

[37]  Philip S. Yu,et al.  Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking , 2019, STARSEM.

[38]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[39]  Csr Young,et al.  How to Do Things With Words , 2009 .

[40]  J. Austin How to do things with words , 1962 .

[41]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[42]  Matthew Henderson,et al.  Deep Neural Network Approach for the Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[43]  Hwaran Lee,et al.  SUMBT+LaRL: End-to-end Neural Task-oriented Dialog System with Reinforcement Learning , 2020, ArXiv.

[44]  Matthew Henderson,et al.  Training Neural Response Selection for Task-Oriented Dialogue Systems , 2019, ACL.

[45]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[46]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[47]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[48]  Ivan Vulić,et al.  Hello, It’s GPT-2 - How Can I Help You? Towards the Use of Pretrained Language Models for Task-Oriented Dialogue Systems , 2019, EMNLP.

[49]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[50]  Dilek Z. Hakkani-Tür,et al.  Scalable multi-domain dialogue state tracking , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[51]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[52]  James Demmel,et al.  Large Batch Optimization for Deep Learning: Training BERT in 76 minutes , 2019, ICLR.

[53]  Nathan Ng,et al.  Simple and Effective Noisy Channel Modeling for Neural Machine Translation , 2019, EMNLP.

[54]  Chris Dyer,et al.  Better Document-Level Machine Translation with Bayes’ Rule , 2019, Transactions of the Association for Computational Linguistics.

[55]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[56]  Kee-Eung Kim,et al.  End-to-End Neural Pipeline for Goal-Oriented Dialogue Systems using GPT-2 , 2020, ACL.

[57]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[58]  Lei Yu,et al.  The Neural Noisy Channel , 2016, ICLR.

[59]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[60]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[61]  Xifeng Yan,et al.  Neural Assistant: Joint Action Prediction, Response Generation, and Latent Knowledge Reasoning , 2019, ArXiv.

[62]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2019, AAAI Conference on Artificial Intelligence.

[63]  Zhou Yu,et al.  Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models , 2019, EACL.

[64]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[65]  Jacob Andreas,et al.  Task-Oriented Dialogue as Dataflow Synthesis , 2020, Transactions of the Association for Computational Linguistics.

[66]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[67]  Jianfeng Gao,et al.  SOLOIST: Few-shot Task-Oriented Dialog with A Single Pre-trained Auto-regressive Model , 2020, ArXiv.