Analysis of Temporal Features for Interaction Quality Estimation

Many different approaches for estimating the Interaction Quality (IQ) of Spoken Dialogue Systems have been investigated. While dialogues clearly have a sequential nature, statistical classification approaches designed for sequential problems do not seem to work better on automatic IQ estimation than static approaches, i.e., regarding each turn as being independent of the corresponding dialogue. Hence, we analyse this effect by investigating the subset of temporal features used as input for statistical classification of IQ. We extend the set of temporal features to contain the system and the user view. We determine the contribution of each feature sub-group showing that temporal features contribute most to the classification performance. Furthermore, for the feature sub-group modeling the temporal effects with a window, we modify the window size increasing the overall performance significantly by +15.69%.

[1]  Wolfgang Minker,et al.  Improving Interaction Quality Recognition Using Error Correction , 2013, SIGDIAL Conference.

[2]  Wolfgang Minker,et al.  On Quality Ratings for Spoken Dialogue Systems – Experts vs. Users , 2013, NAACL.

[3]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[4]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[5]  Ryuichiro Higashinaka,et al.  Issues in Predicting User Satisfaction Transitions in Dialogues: Individual Differences, Evaluation Criteria, and Prediction Models , 2010, IWSDS.

[6]  Wolfgang Minker,et al.  Towards Quality-Adaptive Spoken Dialogue Management , 2012, SDCTD@NAACL-HLT.

[7]  Kazuya Takeda,et al.  Estimation Method of User Satisfaction Using N-gram-based Dialog History Model for Spoken Dialog System , 2010, LREC.

[8]  Wolfgang Minker,et al.  Application and Evaluation of a Conditioned Hidden Markov Model for Estimating Interaction Quality of Spoken Dialogue Systems , 2014, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[9]  Sebastian Möller,et al.  Modeling User Satisfaction with Hidden Markov Models , 2009, SIGDIAL Conference.

[10]  Wolfgang Minker,et al.  Modeling and Predicting Quality in Spoken Human-Computer Interaction , 2011, SIGDIAL Conference.

[11]  Wolfgang Minker,et al.  A statistical approach for estimating user satisfaction in spoken human-machine interaction , 2011, 2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT).

[12]  Wolfgang Minker,et al.  Dialogue Management for User-Centered Adaptive Dialogue , 2016 .

[13]  Stefan Ultes,et al.  User-centred adaptive spoken dialogue modelling , 2015 .

[14]  Satoshi Nakamura,et al.  Spoken Dialogue Systems for Ambient Environments , 2010, Lecture Notes in Computer Science.

[15]  Wolfgang Minker,et al.  Analysis of an Extended Interaction Quality Corpus , 2015, Natural Language Dialog Systems and Intelligent Assistants.

[16]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[17]  Wolfgang Minker,et al.  Interaction Quality Estimation in Spoken Dialogue Systems Using Hybrid-HMMs , 2014, SIGDIAL Conference.

[18]  Ryuichiro Higashinaka,et al.  Modeling User Satisfaction Transitions in Dialogues from Overall Ratings , 2010, SIGDIAL Conference.

[19]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[20]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[21]  Wolfgang Minker,et al.  First Insight into Quality-Adaptive Dialogue , 2014, LREC.

[22]  Wolfgang Minker,et al.  A Theoretical Framework for a User-Centered Spoken Dialog Manager , 2011, IWSDS.

[23]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[24]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[25]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Stefan Ultes,et al.  Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts - And how it relates to user satisfaction , 2015, Speech Commun..

[28]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[29]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.