On the Applicability of a User Satisfaction-Based Reward for Dialogue Policy Learning

Finding a good dialogue policy using reinforcement learning usually relies on objective criteria for modelling the reward signal, e.g., task success. In this contribution, we propose to use user satisfaction instead represented by the metric Interaction Quality (IQ). Comparing the user satisfaction-based reward to the baseline of task success, we show that IQ is a real alternative for reward modelling: designing a reward function using IQ may result in a similar or even better performance than using task success. This is demonstrated in a user simulator evaluation using a live IQ estimation module.

[1]  Wolfgang Minker,et al.  Dialogue Management for User-Centered Adaptive Dialogue , 2016 .

[2]  Wolfgang Minker,et al.  Quality-adaptive Spoken Dialogue Initiative Selection And Implications On Reward Modelling , 2015, SIGDIAL Conference.

[3]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[4]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[5]  Maxine Eskénazi,et al.  An Unsupervised Approach to User Simulation: Toward Self-Improving Dialog Systems , 2012, SIGDIAL Conference.

[6]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[7]  Dongho Kim,et al.  On-line policy optimisation of Bayesian spoken dialogue systems via human interaction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[9]  Wolfgang Minker,et al.  A Theoretical Framework for a User-Centered Spoken Dialog Manager , 2011, IWSDS.

[10]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[11]  Wolfgang Minker,et al.  On Quality Ratings for Spoken Dialogue Systems – Experts vs. Users , 2013, NAACL.

[12]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[13]  Romain Laroche,et al.  Reward Shaping for Statistical Optimisation of Dialogue Management , 2013, SLSP.

[14]  Oliver Lemon,et al.  Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz Data: Bootstrapping and Evaluation , 2008, ACL.

[15]  Steve J. Young,et al.  Characterizing task-oriented dialog using a simulated ASR chanel , 2004, INTERSPEECH.

[16]  Wolfgang Minker,et al.  Interaction Quality Estimation in Spoken Dialogue Systems Using Hybrid-HMMs , 2014, SIGDIAL Conference.

[17]  Romain Laroche,et al.  Ordinal regression for interaction quality prediction , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Romain Laroche,et al.  Reward Function Learning for Dialogue Management , 2012, STAIRS.

[19]  Stefan Ultes,et al.  Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning , 2017, INTERSPEECH.

[20]  Wolfgang Minker,et al.  Towards Quality-Adaptive Spoken Dialogue Management , 2012, SDCTD@NAACL-HLT.

[21]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[22]  Wolfgang Minker,et al.  Modeling and Predicting Quality in Spoken Human-Computer Interaction , 2011, SIGDIAL Conference.

[23]  Stefan Ultes,et al.  Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts - And how it relates to user satisfaction , 2015, Speech Commun..

[24]  Wolfgang Minker,et al.  Managing adaptive spoken dialogue for Intelligent Environments , 2014, J. Ambient Intell. Smart Environ..