Improved Goal Oriented Dialogue via Utterance Generation and Look Ahead

Goal oriented dialogue systems have become a prominent customer-care interaction channel for most businesses. However, not all interactions are smooth, and customer intent misunderstanding is a major cause of dialogue failure. We show that intent prediction can be improved by training a deep text-to-text neural model to generate successive user utterances from unlabeled dialogue data. For that, we define a multi-task training regime that utilizes successive user-utterance generation to improve the intent prediction. Our approach achieves the reported improvement due to two complementary factors: First, it uses a large amount of unlabeled dialogue data for an auxiliary generation task. Second, it uses the generated user utterance as an additional signal for the intent prediction model. Lastly, we present a novel look-ahead approach that uses user utterance generation to improve intent prediction in inference time. Specifically, we generate counterfactual successive user utterances for conversations with ambiguous predicted intents, and disambiguate the prediction by reassessing the concatenated sequence of available and generated utterances.

[1]  Xu Sun,et al.  Regularizing Dialogue Generation by Imitating Implicit Scenarios , 2020, EMNLP.

[2]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[3]  Wen Wang,et al.  BERT for Joint Intent Classification and Slot Filling , 2019, ArXiv.

[4]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.

[5]  Marilyn A. Walker,et al.  All the World's a Stage: Learning Character Models from Film , 2011, AIIDE.

[6]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[7]  Derek Chen,et al.  Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[8]  Jianfeng Gao,et al.  A User Simulator for Task-Completion Dialogues , 2016, ArXiv.

[9]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2020, AAAI.

[10]  Pascale Fung,et al.  HappyBot: Generating Empathetic Dialogue Responses by Improving User Experience Look-ahead , 2019, ArXiv.

[11]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[12]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[13]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[14]  Haruna Isah,et al.  A Voice Interactive Multilingual Student Support System using IBM Watson , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[15]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[16]  Ateret Anaby-Tavor,et al.  Do Not Have Enough Data? Deep Learning to the Rescue! , 2020, AAAI.

[17]  Ehsan Hosseini-Asl,et al.  Toward Scalable Neural Dialogue State Tracking Model , 2018, ArXiv.

[18]  Xuan Li,et al.  An End-to-End Dialogue State Tracking System with Machine Reading Comprehension and Wide & Deep Classification , 2019, ArXiv.

[19]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[22]  Iñigo Casanueva,et al.  Neural User Simulation for Corpus-based Policy Optimisation of Spoken Dialogue Systems , 2018, SIGDIAL Conference.

[23]  Zhiwei Zhao,et al.  Attention-Based Convolutional Neural Networks for Sentence Classification , 2016, INTERSPEECH.

[24]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[25]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[26]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[27]  Gökhan Tür,et al.  Towards deeper understanding: Deep convex networks for semantic utterance classification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Zhoujun Li,et al.  Building Task-Oriented Dialogue Systems for Online Shopping , 2017, AAAI.

[29]  Alfred Kobsa,et al.  User modeling in dialog systems: Potentials and hazards , 1990, AI & SOCIETY.

[30]  Grace Hui Yang,et al.  Win-win search: dual-agent stochastic game in session search , 2014, SIGIR.

[31]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[32]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[33]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[34]  Dain Kaplan,et al.  Conversational Semantic Parsing for Dialog State Tracking , 2020, EMNLP.

[35]  Jason Weston,et al.  Dialog-based Language Learning , 2016, NIPS.

[36]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[37]  Jing He,et al.  A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.

[38]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[39]  Bing Liu,et al.  An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog , 2017, INTERSPEECH.

[40]  Nick Pawlowski,et al.  Rasa: Open Source Language Understanding and Dialogue Management , 2017, ArXiv.

[41]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .