Efficient model learning for dialog management

Intelligent planning algorithms such as the Partially Observable Markov Decision Process (POMDP) have succeeded in dialog management applications [10, 11, 12] because they are robust to the inherent uncertainty of human interaction. Like all dialog planning systems, however, POMDPs require an accurate model of the user (e.g., what the user might say or want). POMDPs are generally specified using a large probabilistic model with many parameters. These parameters are difficult to specify from domain knowledge, and gathering enough data to estimate the parameters accurately a priori is expensive. In this paper, we take a Bayesian approach to learning the user model simultaneously with dialog manager policy. At the heart of our approach is an efficient incremental update algorithm that allows the dialog manager to replan just long enough to improve the current dialog policy given data from recent interactions. The update process has a relatively small computational cost, preventing long delays in the interaction. We are able to demonstrate a robust dialog manager that learns from interaction data, out-performing a hand-coded model in simulation and in a robotic wheelchair application.

[1]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on Hidden Markov Models , 1986 .

[3]  C. Watkins Learning from delayed rewards , 1989 .

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[6]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[7]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[8]  Eric A. Hansen,et al.  An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[9]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[10]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[11]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[12]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[13]  Stephanie Seneff,et al.  Dialogue Management in the Mercury Flight Reservation System , 2000 .

[14]  Marilyn A. Walker,et al.  NJFun- A Reinforcement Learning Spoken Dialogue System , 2000 .

[15]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[16]  Kee-Eung Kim,et al.  Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.

[17]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[18]  M. Karahan,et al.  Combining classifiers for spoken language understanding , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[19]  Laurent El Ghaoui,et al.  Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[20]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[21]  Sebastian Thrun,et al.  Perspectives on standardization in mobile robot programming: the Carnegie Mellon Navigation (CARMEN) Toolkit , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[22]  Eric Horvitz,et al.  Optimizing Automated Call Routing by Integrating Spoken Dialog Models with Queuing Models , 2004, NAACL.

[23]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[24]  Shimei Pan,et al.  Designing and Evaluating an Adaptive Spoken Dialogue System , 2002, User Modeling and User-Adapted Interaction.

[25]  Michael L. Littman,et al.  Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[26]  Joelle Pineau,et al.  A Hierarchical Approach to POMDP Planning and Execution , 2004 .

[27]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[28]  Joelle Pineau,et al.  Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.

[29]  Yishay Mansour,et al.  Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.

[30]  Jesse Hoey,et al.  Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.

[31]  J.D. Williams,et al.  Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[32]  Doina Precup,et al.  Learning in non-stationary Partially Observable Markov Decision Processes , 2005 .

[33]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management , 2008, SIGDIAL.

[34]  Alfred Kobsa User Modeling and User-Adapted Interaction , 2005, User Modeling and User-Adapted Interaction.

[35]  Sridhar Mahadevan,et al.  Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[36]  Jesse Hoey,et al.  POMDP Models for Assistive Technology , 2005, AAAI Fall Symposium: Caring Machines.

[37]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[38]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[39]  Shie Mannor,et al.  The Robustness-Performance Tradeoff in Markov Decision Processes , 2006, NIPS.

[40]  Geoffrey J. Gordon,et al.  Point-based approximations for fast POMDP solving , 2006 .

[41]  Thomas Hofmann,et al.  Automated Hierarchy Discovery for Planning in Partially Observable Environments , 2007 .

[42]  Sriraam Natarajan,et al.  A Decision-Theoretic Model of Assistance , 2007, IJCAI.

[43]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..