论文信息 - Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces

Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces

Learning dialogue strategies using the reinforcement learning framework is problematic due to its expensive computational cost. In this paper we propose an algorithm that reduces a state-action space to one which includes only valid state-actions. We performed experiments on full and reduced spaces using three systems (with 5, 9 and 20 slots) in the travel domain using a simulated environment. The task was to learn multi-goal dialogue strategies optimizing single and multiple confirmations. Average results using strategies learnt on reduced spaces reveal the following benefits against full spaces: 1) less computer memory (94% reduction), 2) faster learning (93% faster convergence) and better performance (8.4% less time steps and 7.7% higher reward). Index Terms: reinforcement learning, spoken dialogue systems.

Oliver Lemon | Heriberto Cuayáhuitl | Steve Renals | Hiroshi Shimodaira

[1] Victor Zue,et al. Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[2] J. Schatztnann,et al. Effects of the user model on simulation-based learning of dialogue strategies , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[3] S. Singh,et al. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[5] Olivier Pietquin,et al. ASR system modeling for automatic evaluation and optimization of dialogue systems , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Roberto Pieraccini,et al. A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[7] Marilyn A. Walker,et al. Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems , 2001, ACL.

[8] Steve Young,et al. Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[9] Mikio Nakano,et al. Fast Reinforcement Learning of Dialogue Policies Using Stable Function Approximation , 2004, IJCNLP.

[10] J.D. Williams,et al. Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11] H. Cuayahuitl,et al. Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[12] Kallirroi Georgila,et al. Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data , 2005 .