Predicting Interaction Quality in Customer Service Dialogs

In this paper, we apply a dialog evaluation Interaction Quality (IQ) framework to human-computer customer service dialogs. IQ framework can be used to predict user satisfaction at an utterance level in a dialog. Such a rating framework is useful for online adaptation of dialog system behavior and increasing user engagement through personalization. We annotated a dataset of 120 human-computer dialogs from two customer service application domains with IQ scores. Our inter-annotator agreement (\(\rho =0.72/0.66\)) is similar to the agreement observed on the IQ annotations of publicly available bus information corpus. The IQ prediction performance of an in-domain SVM model trained on a small set of call center domain dialogs achieves a correlation of \(\rho =0.53{/}0.56\) measured against the annotated IQ scores. A generic model built exclusively on public LEGO data achieves 94%/65% of the in-domain model’s performance. An adapted model built by extending a public dataset with a small set of dialogs in a target domain achieves 102%/81% of the in-domain model’s performance.

[1]  Markku Turunen,et al.  Subjective evaluation of spoken dialogue systems using SER VQUAL method , 2004, INTERSPEECH.

[2]  Shourya Roy,et al.  QART: A System for Real-Time Holistic Quality Assurance for Contact Center Dialogues , 2016, AAAI.

[3]  David Suendermann-Oeft,et al.  Minimally invasive surgery for spoken dialog systems , 2010, INTERSPEECH.

[4]  Wolfgang Minker,et al.  Analysis of an Extended Interaction Quality Corpus , 2015, Natural Language Dialog Systems and Intelligent Assistants.

[5]  Shourya Roy,et al.  QART: A Tool for Quality Assurance in Real-Time in Contact Centers , 2016, CIKM.

[6]  F. Reichheld The one number you need to grow. , 2003, Harvard business review.

[7]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[8]  David Suendermann-Oeft,et al.  Caller Experience: A method for evaluating dialog systems and its automatic prediction , 2008, 2008 IEEE Spoken Language Technology Workshop.

[9]  Maxine Eskénazi,et al.  Let's go public! taking a spoken dialog system to the real world , 2005, INTERSPEECH.

[10]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[11]  Wolfgang Minker,et al.  Recurrent Neural Network Interaction Quality Estimation , 2016, IWSDS.

[12]  Jeremy H. Wright,et al.  Automatically Training a Problematic Dialogue Predictor for a Spoken Dialogue System , 2011, J. Artif. Intell. Res..

[13]  K. Á. T.,et al.  Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.

[14]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[15]  Wolfgang Minker,et al.  Quality-adaptive Spoken Dialogue Initiative Selection And Implications On Reward Modelling , 2015, SIGDIAL Conference.

[16]  Wolfgang Minker,et al.  Modeling and Predicting Quality in Spoken Human-Computer Interaction , 2011, SIGDIAL Conference.

[17]  Jackson Liscombe,et al.  Detecting Problematic Dialogs with Automated Agents , 2008, PIT.

[18]  Nicole Beringer,et al.  PROMISE - A Procedure for Multimodal Interactive System Evaluation , 2002 .

[19]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[20]  Wolfgang Minker,et al.  Interaction Quality Estimation in Spoken Dialogue Systems Using Hybrid-HMMs , 2014, SIGDIAL Conference.

[21]  Stefan Ultes,et al.  Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts - And how it relates to user satisfaction , 2015, Speech Commun..

[22]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.