Sequential Decision Making in Spoken Dialog Management

This chapter includes two major sections. In Sect. 3.1, we introduce sequential decision making and study the supporting mathematical framework for it. We describe the Markov decision process (MDP) and the partially observable MDP (POMDP) frameworks, and present the well-known algorithms for solving them. In Sect. 3.2, we introduce spoken dialog systems (SDSs). Then, we study the related work of sequential decision making in spoken dialog management. In particular, we study the related research on application of the POMDP framework for spoken dialog management. Finally, we review the user modeling techniques that have been used for dialog POMDPs.

[1]  Oliver Lemon,et al.  Cluster-based user simulations for learning dialogue strategies , 2006, INTERSPEECH.

[2]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[3]  Joelle Pineau,et al.  Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs , 2012, IEEE Journal of Selected Topics in Signal Processing.

[4]  Milica Gasic,et al.  Training and Evaluation of the HIS POMDP Dialogue System in Noise , 2008, SIGDIAL Workshop.

[5]  Konrad Scheffler,et al.  Probabilistic simulation of human-machine dialogues , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[7]  Steve J. Young,et al.  The Hidden Agenda User Simulation Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[9]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[10]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[11]  Olivier Pietquin Consistent Goal-Directed User Model for Realisitc Man-Machine Task-Oriented Spoken Dialogue Simulation , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[12]  Jiming Liu,et al.  A novel orthogonal NMF-based belief compression for POMDPs , 2007, ICML '07.

[13]  Kallirroi Georgila,et al.  User simulation for spoken dialogue systems: learning and evaluation , 2006, INTERSPEECH.

[14]  Roberto Pieraccini,et al.  User modeling for spoken dialogue system evaluation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[15]  Deb Roy,et al.  Connecting language to the world , 2005, Artif. Intell..

[16]  Kee-Eung Kim,et al.  Robust Performance Evaluation of POMDP-Based Dialogue Systems , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Blaise Roger Marie Thomson,et al.  Statistical methods for spoken dialogue management , 2013 .

[18]  Oliver Lemon,et al.  Recent research advances in Reinforcement Learning in Spoken Dialogue Systems , 2009, The Knowledge Engineering Review.

[19]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[20]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[21]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[22]  Judy Goldsmith,et al.  Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[23]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[24]  David R Traum,et al.  Towards a Computational Theory of Grounding in Natural Language Conversation , 1991 .

[25]  R. Bellman A Markovian Decision Process , 1957 .

[26]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[27]  H. Cuayahuitl,et al.  Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[28]  Hua Ai,et al.  Knowledge consistent user simulations for dialog systems , 2007, INTERSPEECH.

[29]  R. Bellman Dynamic programming. , 1957, Science.

[30]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[31]  Joelle Pineau,et al.  Tractable planning under uncertainty: exploiting structure , 2004 .

[32]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[33]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[34]  Pierre Lison Model-based Bayesian reinforcement learning for dialogue management , 2013, INTERSPEECH.

[35]  George E. Monahan,et al.  A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 2007 .

[36]  Blai Bonet,et al.  Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback , 2003, IJCAI.

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  Nicholas Roy,et al.  Efficient model learning for dialog management , 2007, 2007 2nd ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[39]  Kallirroi Georgila,et al.  Learning user simulations for information state update dialogue systems , 2005, INTERSPEECH.

[40]  Joelle Pineau,et al.  Bayesian reinforcement learning for POMDP-based dialogue systems , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[42]  Baining Guo,et al.  Planning and Acting under Uncertainty: A New Model for Spoken Dialogue System , 2001, UAI.

[43]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[44]  Sébastien Paquet,et al.  Distributed Decision-Making and TaskCoordination in Dynamic, Uncertain andReal-Time Multiagent Environments , 2005 .

[45]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[46]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[47]  Roberto Pieraccini,et al.  Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[48]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[49]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[50]  Matthieu Geist,et al.  User Simulation in Dialogue Systems Using Inverse Reinforcement Learning , 2011, INTERSPEECH.

[51]  Marco Wiering,et al.  Utile distinction hidden Markov models , 2004, ICML.

[52]  Joelle Pineau,et al.  A Bayesian Method for Learning POMDP Observation Parameters for Robot Interaction Management Systems , 2010 .

[53]  Joelle Pineau,et al.  Reinforcement Learning with Limited Reinforcement : Using Bayes Risk for Active Learning in POMDPs Finale , 2012 .

[54]  Joelle Pineau,et al.  A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..

[55]  Hyeong Seop Sim,et al.  Effects of user modeling on POMDP-based dialogue systems , 2008, INTERSPEECH.

[56]  Thierry Dutoit,et al.  A probabilistic framework for dialog simulation and optimal strategy learning , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[57]  Nicholas Roy,et al.  Spoken language interaction with model uncertainty: an adaptive human–robot interaction system , 2008, Connect. Sci..

[58]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[59]  Peng Dai,et al.  Topological Value Iteration Algorithm for Markov Decision Processes , 2007, IJCAI.

[60]  Milica Gasic,et al.  Parameter estimation for agenda-based user simulation , 2010, SIGDIAL Conference.

[61]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[62]  G. Monahan State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .

[63]  Steve J. Young,et al.  Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems , 2010, Comput. Speech Lang..

[64]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[65]  Nikos A. Vlassis,et al.  A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[66]  Oliver Lemon,et al.  Reinforcement Learning for Adaptive Dialogue Systems - A Data-driven Methodology for Dialogue Management and Natural Language Generation , 2011, Theory and Applications of Natural Language Processing.

[67]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[68]  Jason D. Williams,et al.  Partially Observable Markov Decision Processes for Spoken Dialogue Management , 2006 .