论文信息 - Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers

Modelling dialogue as a Partially Observable Markov Decision Process (POMDP) enables a dialogue policy robust to speech understanding errors to be learnt. However, a major challenge in POMDP policy learning is to maintain tractability, so the use of approximation is inevitable. We propose applying Gaussian Processes in Reinforcement learning of optimal POMDP dialogue policies, in order (1) to make the learning process faster and (2) to obtain an estimate of the uncertainty of the approximation. We first demonstrate the idea on a simple voice mail dialogue task and then apply this method to a real-world tourist information dialogue task.

[1] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[2] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[3] Thomas G. Dietterich. Adaptive computation and machine learning , 1998 .

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[6] Jason D. Williams,et al. Partially Observable Markov Decision Processes for Spoken Dialogue Management , 2006 .

[7] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[8] Carl E. Rasmussen,et al. Gaussian process dynamic programming , 2009, Neurocomputing.

[9] Milica Gasic,et al. The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..