Belief Tracking Interaction Quality Reward Estimation Semantic Decoding Speech Recognition Speech Synthesis Language Generation Environment

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward based on user satisfaction. We will show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we will show that one satisfaction estimation model which has been trained on one domain may be applied in many other domains which cover a similar task. We will verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using the user satisfaction and task success acquired directly from the users as reward.

[1]  Juliana Miehle,et al.  On the Applicability of a User Satisfaction-Based Reward for Dialogue Policy Learning , 2017, IWSDS.

[2]  David Vandyke,et al.  PyDial: A Multi-domain Statistical Dialogue System Toolkit , 2017, ACL.

[3]  David Vandyke,et al.  On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems , 2016, ACL.

[4]  David Vandyke,et al.  Multi-domain dialogue success classifiers for policy training , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[5]  Stefan Ultes,et al.  Interaction Quality: Assessing the quality of ongoing spoken dialog interaction by experts - And how it relates to user satisfaction , 2015, Speech Commun..

[6]  David Vandyke,et al.  Learning from real users: rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems , 2015, INTERSPEECH.

[7]  Wolfgang Minker,et al.  Quality-adaptive Spoken Dialogue Initiative Selection And Implications On Reward Modelling , 2015, SIGDIAL Conference.

[8]  Romain Laroche,et al.  Task Completion Transfer Learning for Reward Inference , 2014, AAAI 2014.

[9]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[10]  Romain Laroche,et al.  Ordinal regression for interaction quality prediction , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Wolfgang Minker,et al.  Dialogue Management for User-Centered Adaptive Dialogue , 2016 .

[12]  Wolfgang Minker,et al.  Interaction Quality Estimation in Spoken Dialogue Systems Using Hybrid-HMMs , 2014, SIGDIAL Conference.

[13]  Milica Gasic,et al.  Gaussian Processes for POMDP-Based Dialogue Manager Optimization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Dongho Kim,et al.  Incremental on-line adaptation of POMDP-based dialogue managers to extended domains , 2014, INTERSPEECH.

[15]  Romain Laroche,et al.  Reward Shaping for Statistical Optimisation of Dialogue Management , 2013, SLSP.

[16]  Dongho Kim,et al.  On-line policy optimisation of Bayesian spoken dialogue systems via human interaction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[18]  Wolfgang Minker,et al.  On Quality Ratings for Spoken Dialogue Systems – Experts vs. Users , 2013, NAACL.

[19]  Wolfgang Minker,et al.  Improving Interaction Quality Recognition Using Error Correction , 2013, SIGDIAL Conference.

[20]  Wolfgang Minker,et al.  Towards Quality-Adaptive Spoken Dialogue Management , 2012, SDCTD@NAACL-HLT.

[21]  Matthieu Geist,et al.  Off-policy learning in large-scale POMDP-based dialogue systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Romain Laroche,et al.  Reward Function Learning for Dialogue Management , 2012, STAIRS.

[23]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[24]  Wolfgang Minker,et al.  Modeling and Predicting Quality in Spoken Human-Computer Interaction , 2011, SIGDIAL Conference.

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[26]  Wolfgang Minker,et al.  A Theoretical Framework for a User-Centered Spoken Dialog Manager , 2011, IWSDS.

[27]  Steve J. Young,et al.  The Hidden Agenda User Simulation Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Oliver Lemon,et al.  Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz Data: Bootstrapping and Evaluation , 2008, ACL.

[29]  Oliver Lemon,et al.  Automatic Learning and Evaluation of User-Centered Objective Functions for Dialogue System Optimisation , 2008, LREC.

[30]  Oliver Lemon,et al.  Author manuscript, published in "European Conference on Speech Communication and Technologies (Interspeech'07), Anvers: Belgium (2007)" Machine Learning for Spoken Dialogue Systems , 2022 .

[31]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[32]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  Marilyn A. Walker,et al.  Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email , 1998, COLING-ACL.

[35]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[36]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.