Optimisation for POMDP-Based Spoken Dialogue Systems

Spoken dialogue systems (SDS) allow users to interact with a wide variety of information systems using speech as the primary, and often the only, communication medium. The principal elements of an SDS are a speech understanding component which converts each spoken input into an abstract semantic representation called a user dialogue act (see Chap. 3), a dialogue manager which responds to the user’s input and generates a system act a t in response, and a message generator which converts each system act back into speech (see Chap. 6). At each turn t, the system updates its state s t , and based on a policy π, it determines the next system act a t = π(s t ). The state consists of the variables needed to track the progress of the dialogue and the attribute values (often called slots) that determine the user’s requirements. In conventional systems, as discussed in Chap. 8, the policy is usually defined by a flow chart with nodes representing states and actions and arcs representing user inputs.

[1]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[2]  Ulrich Paquet Bayesian inference for latent variable models , 2007 .

[3]  Pascal Poupart,et al.  Factored partially observable Markov decision processes for dialogue management , 2005 .

[4]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[5]  Steve J. Young,et al.  Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems , 2010, Comput. Speech Lang..

[6]  Steve J. Young,et al.  Talking to machines (statistically speaking) , 2002, INTERSPEECH.

[7]  Fabrice Lefèvre,et al.  Back-off action selection in summary space-based POMDP dialogue systems , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[10]  Filip Jur C I Cek Natural Actor and Belief Critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs , 2010 .

[11]  Milica Gasic,et al.  Effective handling of dialogue state in the hidden information state POMDP-based dialogue manager , 2011, TSLP.

[12]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[14]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[15]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[16]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[17]  Blaise Roger Marie Thomson,et al.  Statistical methods for spoken dialogue management , 2013 .

[18]  Steve J. Young,et al.  Bayesian update of dialogue state for robust dialogue systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[20]  Jason D. Williams,et al.  Partially Observable Markov Decision Processes for Spoken Dialogue Management , 2006 .

[21]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[22]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[23]  Milica Gasic,et al.  Parameter learning for POMDP spoken dialogue models , 2010, 2010 IEEE Spoken Language Technology Workshop.

[24]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[25]  Milica Gasic,et al.  Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems , 2010, INTERSPEECH.

[26]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[27]  Yaakov Engel,et al.  Algorithms and representations for reinforcement learning (עם תקציר בעברית, תכן ושער נוסף: אלגוריתמים וייצוגים ללמידה מחיזוקים.; אלגוריתמים וייצוגים ללמידה מחיזוקים.) , 2005 .

[28]  Milica Gasic,et al.  Training and Evaluation of the HIS POMDP Dialogue System in Noise , 2008, SIGDIAL Workshop.

[29]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[30]  J. Schatztnann,et al.  Effects of the user model on simulation-based learning of dialogue strategies , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[31]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[32]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[33]  Milica Gasic,et al.  Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers , 2010, SIGDIAL Conference.

[34]  Milica Gasic,et al.  Modelling user behaviour in the HIS-POMDP dialogue manager , 2008, 2008 IEEE Spoken Language Technology Workshop.