Chat Detection in an Intelligent Assistant: Combining Task-oriented and Non-task-oriented Spoken Dialogue Systems

Recently emerged intelligent assistants on smartphones and home electronics (e.g., Siri and Alexa) can be seen as novel hybrids of domain-specific task-oriented spoken dialogue systems and open-domain non-task-oriented ones. To realize such hybrid dialogue systems, this paper investigates determining whether or not a user is going to have a chat with the system. To address the lack of benchmark datasets for this task, we construct a new dataset consisting of 15; 160 utterances collected from the real log data of a commercial intelligent assistant (and will release the dataset to facilitate future research activity). In addition, we investigate using tweets and Web search queries for handling open-domain user utterances, which characterize the task of chat detection. Experiments demonstrated that, while simple supervised methods are effective, the use of the tweets and search queries further improves the F1-score from 86.21 to 87.53.

[1]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[4]  Andreas Stolcke,et al.  A comparative study of neural network models for lexical intent classification , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[5]  Gary Geunbae Lee,et al.  Example-based dialog modeling for practical multi-domain dialog system , 2009, Speech Commun..

[6]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[7]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[9]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[10]  Gökhan Tür,et al.  Intent detection using semantically enriched word embeddings , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[11]  David R. Traum,et al.  A reranking approach for recognition and classification of speech input in conversational dialogue systems , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[12]  Imed Zitouni,et al.  Predicting User Satisfaction with Intelligent Assistants , 2016, SIGIR.

[13]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[14]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[15]  Hao Tian,et al.  Policy Learning for Domain Selection in an Extensible Multi-domain Spoken Dialogue System , 2014, EMNLP.

[16]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[17]  Young-Bum Kim,et al.  Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform , 2016, NAACL.

[18]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Masahiro Araki,et al.  Towards Taxonomy of Errors in Chat-oriented Dialogue Systems , 2015, SIGDIAL Conference.

[21]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[22]  Kugatsu Sadamitsu,et al.  Building a conversational model from two-tweets , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[23]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Imed Zitouni,et al.  Automatic Online Evaluation of Intelligent Assistants , 2015, WWW.

[26]  Imed Zitouni,et al.  Understanding User Satisfaction with Intelligent Assistants , 2016, CHIIR.

[27]  Hayato Kobayashi,et al.  Effects of Game on User Engagement with Spoken Dialogue System , 2015, SIGDIAL Conference.

[28]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[29]  Ruhi Sarikaya An overview of the system architecture and key components The Technology Behind Personal Digital Assistants , 2022 .

[30]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[31]  Tong Zhang,et al.  Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding , 2015, NIPS.

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[34]  Daisuke Kawahara,et al.  Large-Scale Acquisition of Commonsense Knowledge via a Quiz Game on a Dialogue System , 2016 .

[35]  Nobuhiro Kaji,et al.  Prediction of Prospective User Engagement with Intelligent Assistants , 2016, ACL.

[36]  Nobuhiro Kaji,et al.  Predicting and Eliciting Addressee's Emotion in Online Dialogue , 2013, ACL.

[37]  Raquel Fernández,et al.  Clarifying Intentions in Dialogue: A Corpus Study , 2015, IWCS.

[38]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[39]  Jianfeng Gao,et al.  A Neural Network Approach to Context-Sensitive Generation of Conversational Responses , 2015, NAACL.

[40]  Richard S. Wallace,et al.  The Anatomy of A.L.I.C.E. , 2009 .

[41]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[42]  Ruhi Sarikaya,et al.  Contextual domain classification in spoken language understanding systems using recurrent neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Lukás Burget,et al.  Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[44]  Zhoujun Li,et al.  DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents , 2016, ACL.

[45]  Ryuichiro Higashinaka,et al.  Controlling Listening-oriented Dialogue using Partially Observable Markov Decision Processes , 2010, COLING.