IQ-Net: A DNN Model for Estimating Interaction-level Dialogue Quality with Conversational Agents

An automated metric to evaluate dialogue quality is critical for continuously optimizing large-scale conversational agent systems such as Alexa. Previous approaches for tackling this problem often rely on a limited set of manually designed and/or heuristic features, which cannot be easily scaled to a large number of domains or scenarios. In this paper, we present Interaction-Quality-Network (IQ-Net), a novel DNN model that allows us to predict interaction-level dialogue quality directly from raw dialogue contents and system metadata without human engineered NLP features. The IQ-Net architecture is compatible with several pre-trained neural network embeddings and architectures such as CNN, Elmo, and BERT. Through an ablation study in Alexa, we demonstrate that several variants of IQ-Net outperform a baseline model with manually engineered features (3.89% improvement in F1 score, 3.15% in accuracy, and 6.1% in precision score), while also reduce the efforts to extend to new domains/usecases.

[1]  Wolfgang Minker,et al.  Interaction Quality Estimation in Spoken Dialogue Systems Using Hybrid-HMMs , 2014, SIGDIAL Conference.

[2]  Tihomir Orehovacki,et al.  Perceived user experience and performance of intelligent personal assistants employed in higher education settings , 2018, 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[3]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4]  Cristiano André da Costa,et al.  Intelligent personal assistants: A systematic literature review , 2020, Expert Syst. Appl..

[5]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[6]  Pei-hao Su,et al.  Reward estimation for dialogue policy optimisation , 2018, Comput. Speech Lang..

[7]  Nobuhiro Kaji,et al.  Prediction of Prospective User Engagement with Intelligent Assistants , 2016, ACL.

[8]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[9]  Eunah Cho,et al.  Paraphrase Generation for Semi-Supervised Learning in NLU , 2019, Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation.

[10]  Laura Serviere Munoz,et al.  Siri, Alexa, and other digital assistants: a study of customer satisfaction with artificial intelligence applications , 2019, Journal of Marketing Management.

[11]  Wolfgang Minker,et al.  Recurrent Neural Network Interaction Quality Estimation , 2016, IWSDS.

[12]  Ruhi Sarikaya An overview of the system architecture and key components The Technology Behind Personal Digital Assistants , 2022 .

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Tihomir Orehovački,et al.  Modelling the Perceived Pragmatic and Hedonic Quality of Intelligent Personal Assistants , 2018, IHSI.

[16]  Nobuhiro Kaji,et al.  Predicting Causes of Reformulation in Intelligent Assistants , 2017, SIGDIAL Conference.

[17]  Byeong Kang,et al.  Comparative Analysis of Intelligent Personal Agent Performance , 2019, PKAW.

[18]  Sebastian Möller,et al.  Modeling User Satisfaction with Hidden Markov Models , 2009, SIGDIAL Conference.

[19]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[20]  Filip Radlinski,et al.  Online Evaluation for Information Retrieval , 2016, Found. Trends Inf. Retr..

[21]  Arantxa Otegi,et al.  Survey on evaluation methods for dialogue systems , 2019, Artificial Intelligence Review.

[22]  Wolfgang Minker,et al.  Interaction Quality Estimation Using Long Short-Term Memories , 2017, SIGDIAL Conference.

[23]  Chih-Hung Hsieh,et al.  Towards better measurement of attention and satisfaction in mobile search , 2014, SIGIR.

[24]  Stefan Ultes,et al.  Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts - And how it relates to user satisfaction , 2015, Speech Commun..

[25]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[26]  Imed Zitouni,et al.  Predicting User Satisfaction with Intelligent Assistants , 2016, SIGIR.

[27]  Lazaros Polymenakos,et al.  Multi-domain Conversation Quality Evaluation via User Satisfaction Estimation , 2019, ArXiv.

[28]  Jihie Kim,et al.  A Study on Dialogue Reward Prediction for Open-Ended Conversational Agents , 2018, ArXiv.

[29]  Imed Zitouni,et al.  Automatic Online Evaluation of Intelligent Assistants , 2015, WWW.

[30]  Imed Zitouni,et al.  Understanding User Satisfaction with Intelligent Assistants , 2016, CHIIR.

[31]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[32]  Luis A. Guerrero,et al.  User Experience Comparison of Intelligent Personal Assistants: Alexa, Google Assistant, Siri and Cortana , 2019, UCAmI.

[33]  Romain Laroche,et al.  Ordinal regression for interaction quality prediction , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[37]  Ryuichiro Higashinaka,et al.  Issues in Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models , 2010, IWSDS.

[38]  Ryen W. White,et al.  Personalized models of search satisfaction , 2013, CIKM.

[39]  Ruhi Sarikaya,et al.  Feedback-Based Self-Learning in Large-Scale Conversational AI Agents , 2019, AAAI.