Regularized Neural User Model for Goal-Oriented Spoken Dialogue Systems

User simulation is widely used to generate artificial dialogues in order to train statistical spoken dialogue systems and perform evaluations. This paper presents a neural network approach for user modeling that exploits an encoder-decoder bidirectional architecture with a regularization layer for each dialogue act. In order to minimize the impact of data sparsity, the dialogue act space is compressed according to the user goal. Experiments on the Dialogue State Tracking Challenge 2 (DSTC2) dataset provide significant results at dialogue act and slot level predictions, outperforming previous neural user modeling approaches in terms of F1 score.

[1]  Oliver Lemon,et al.  Cluster-based user simulations for learning dialogue strategies , 2006, INTERSPEECH.

[2]  Konrad Scheffler,et al.  Probabilistic simulation of human-machine dialogues , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Antoine Raux,et al.  Dialog State Tracking Challenge Handbook , 2012 .

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Olivier Pietquin,et al.  A Framework for Unsupervised Learning of Dialogue Strategies , 2004 .

[6]  Roberto Pieraccini,et al.  User modeling for spoken dialogue system evaluation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  David Griol,et al.  A Statistical User Simulation Technique for the Improvement of a Spoken Dialog System , 2007, CIARP.

[8]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[9]  Matthieu Geist,et al.  User Simulation in Dialogue Systems Using Inverse Reinforcement Learning , 2011, INTERSPEECH.

[10]  Jing He,et al.  A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.

[11]  Alex Lascarides,et al.  Indirect Speech Acts , 2001, Synthese.

[12]  Michael Hancher The classification of cooperative illocutionary acts , 1979, Language in Society.

[13]  H. Cuayahuitl,et al.  Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[14]  Thierry Dutoit,et al.  A probabilistic framework for dialog simulation and optimal strategy learning , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[16]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[17]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[18]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[19]  Giuseppe Riccardi,et al.  Combining user intention and error modeling for statistical dialog simulators , 2010, INTERSPEECH.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.