The use of discriminative belief tracking in POMDP-based dialogue systems

Statistical spoken dialogue systems based on Partially Observable Markov Decision Processes (POMDPs) have been shown to be more robust to speech recognition errors by maintaining a belief distribution over multiple dialogue states and making policy decisions based on the entire distribution rather than the single most likely hypothesis. To date most POMDP-based systems have used generative trackers. However, concerns about modelling accuracy have created interest in discriminative methods, and recent results from the second Dialog State Tracking Challenge (DSTC2) have shown that discriminative trackers can significantly outperform generative models in terms of tracking accuracy. The aim of this paper is to investigate the extent to which these improvements translate into improved task completion rates when incorporated into a spoken dialogue system. To do this, the Recurrent Neural Network (RNN) tracker described by Henderson et al in DSTC2 was integrated into the Cambridge statistical dialogue system and compared with the existing generative Bayesian network tracker. Using a Gaussian Process (GP) based policy, the experimental results indicate that the system using the RNN tracker performs significantly better than the system with the original Bayesian network tracker.

[1]  Dongho Kim,et al.  On-line policy optimisation of Bayesian spoken dialogue systems via human interaction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  C. Bishop Mixture density networks , 1994 .

[3]  Milica Gasic,et al.  Parameter estimation for agenda-based user simulation , 2010, SIGDIAL Conference.

[4]  Matthew Henderson,et al.  Deep Neural Network Approach for the Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[5]  Matthew Henderson,et al.  Word-Based Dialog State Tracking with Recurrent Neural Networks , 2014, SIGDIAL Conference.

[6]  Antoine Raux,et al.  Dialog State Tracking Challenge Handbook , 2012 .

[7]  Matthew Henderson,et al.  N-best error simulation for training spoken dialogue systems , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[8]  Dongho Kim,et al.  Evaluation of Statistical POMDP-Based Dialogue Systems in Noisy Environments , 2016 .

[9]  Milica Gasic,et al.  Gaussian Processes for POMDP-Based Dialogue Manager Optimization , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[11]  Jason D. Williams A critical analysis of two statistical spoken dialog systems in public use , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[12]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13]  Steve J. Young,et al.  Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems , 2010, Comput. Speech Lang..