Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method

Partially observable Markov decision processes (POMDPs) have been shown to be a promising framework for dialog management in spoken dialog systems. However, to date, POMDPs have been limited to artificially small tasks. In this work, we present a novel method called a "summary POMDP" for scaling slot-filling POMDP-based dialog managers to cope with tasks of a realistic size. An example dialog problem incorporating a user model built from real dialog data is presented. A dialog manager is created using this method and evaluated using a second user model created from held-out dialog data. Results confirm that summary POMDP policies scale well, and also show that summary POMDP policies are reasonably robust to variations in user behavior

[1]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[2]  Steve J. Young,et al.  Characterizing task-oriented dialog using a simulated ASR chanel , 2004, INTERSPEECH.

[3]  Steve Young,et al.  A Framework for Wizard-of-Oz Experiments with a Simulated ASR-Channel , 2004 .

[4]  Steve Young,et al.  Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[5]  Branimir Boguraev,et al.  Natural Language Engineering , 1995 .

[6]  Olivier Pietquin,et al.  A Framework for Unsupervised Learning of Dialogue Strategies , 2004 .

[7]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[8]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[9]  Craig Boutilier,et al.  VDCBPI: an Approximate Scalable Algorithm for Large POMDPs , 2004, NIPS.

[10]  Baining Guo,et al.  Planning and Acting under Uncertainty: A New Model for Spoken Dialogue System , 2001, UAI.

[11]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[12]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[13]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management , 2008, SIGDIAL.

[14]  Pascal Poupart,et al.  Factored partially observable Markov decision processes for dialogue management , 2005 .

[15]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[16]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[17]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[18]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.