The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled mathematical framework for modelling the inherent uncertainty in spoken dialogue systems. It briefly summarises the basic mathematics and explains why exact optimisation is intractable. It then describes in some detail a form of approximation called the Hidden Information State model which does scale and which can be used to build practical systems. A prototype HIS system for the tourist information domain is evaluated and compared with a baseline MDP system using both user simulations and a live user trial. The results give strong support to the central contention that the POMDP-based framework is both a tractable and powerful approach to building more robust spoken dialogue systems.

[1]  Steve J. Young,et al.  Error simulation for training statistical dialogue systems , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Pascal Poupart,et al.  Factored partially observable Markov decision processes for dialogue management , 2005 .

[3]  Milica Gasic,et al.  User study of the Bayesian update of dialogue state approach to dialogue management , 2008, INTERSPEECH.

[4]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[5]  C. Raymond Perrault,et al.  Elements of a Plan-Based Theory of Speech Acts , 1979, Cogn. Sci..

[6]  E. Schegloff,et al.  Opening up Closings , 1973 .

[7]  David Traum,et al.  Computational Models of Grounding in Collaborative Systems , 1999 .

[8]  Hui Ye,et al.  The Hidden Information State Approach to Dialog Management , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Wayne H. Ward Understanding spontaneous speech: the Phoenix system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[11]  Steve J. Young,et al.  Talking to machines (statistically speaking) , 2002, INTERSPEECH.

[12]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[13]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[14]  Roberto Pieraccini,et al.  Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[16]  Blai Bonet,et al.  An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes , 2002, ICML.

[17]  Milica Gasic,et al.  Training and Evaluation of the HIS POMDP Dialogue System in Noise , 2008, SIGDIAL Workshop.

[18]  Roger K. Moore Computer Speech and Language , 1986 .

[19]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[20]  Gunnar Evermann,et al.  Posterior probability decoding, confidence estimation and system combination , 2000 .

[21]  Steve J. Young,et al.  Bayesian update of dialogue state for robust dialogue systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Steve J. Young,et al.  USING POMDPS FOR DIALOG MANAGEMENT , 2006, 2006 IEEE Spoken Language Technology Workshop.

[23]  Alan W. Black,et al.  Flite: a small fast run-time synthesis engine , 2001, SSW.

[24]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.

[25]  Jason D. Williams,et al.  The best of both worlds: unifying conventional dialog systems and POMDPs , 2008, INTERSPEECH.

[26]  Milica Gasic,et al.  Modelling user behaviour in the HIS-POMDP dialogue manager , 2008, 2008 IEEE Spoken Language Technology Workshop.

[27]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[28]  Oliver Lemon,et al.  Mixture Model POMDPs for Efficient Handling of Uncertainty in Dialogue Management , 2008, ACL.

[29]  Baining Guo,et al.  Planning and Acting under Uncertainty: A New Model for Spoken Dialogue System , 2001, UAI.

[30]  Roberto Pieraccini,et al.  Automating spoken dialogue management design using machine learning: An industry perspective , 2008, Speech Commun..

[31]  Kallirroi Georgila,et al.  Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems , 2005, SIGDIAL.

[32]  Anton Nijholt,et al.  A tractable hybrid DDN–POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems , 2008, Natural Language Engineering.

[33]  S. Young,et al.  Scaling POMDPs for Spoken Dialog Management , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Stephen Young Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[35]  Jason D. Williams,et al.  Using particle filters to track dialogue state , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[36]  M. Littman The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[37]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[38]  Milica Gasic,et al.  Evaluating semantic-level confidence scores with multiple hypotheses , 2008, INTERSPEECH.

[39]  Wayne Ward Understanding Spontaneous Speech , 1989, HLT.

[40]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[41]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..