论文信息 - Fast Reinforcement Learning of Dialogue Policies Using Stable Function Approximation

Fast Reinforcement Learning of Dialogue Policies Using Stable Function Approximation

We propose a method to speed up reinforcement learning of policies for spoken dialogue systems. This is achieved by combining a coarse grained abstract representation of states and actions with learning only in frequently visited states. The value of unsampled states is approximated by a linear interpolation of known states. Experiments show that the proposed method effectively optimizes dialogue strategies for frequently visited dialogue states.

Mikio Nakano | Kohji Dohsaka | Matthias Denecke

[1] Matthias Denecke. Informational characterization of dialogue states , 2000, INTERSPEECH.

[2] S. Singh,et al. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[4] Marilyn A. Walker,et al. Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email , 1998, COLING-ACL.

[5] Joelle Pineau,et al. Fast reinforcement learning of dialog strategies , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[7] Steve J. Young,et al. Using Wizard-of-Oz simulations to bootstrap Reinforcement - Learning based dialog management systems , 2003, SIGDIAL Workshop.

[8] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[9] Roberto Pieraccini,et al. A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[10] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[11] Andrew McCallum,et al. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[12] E. Levin,et al. A Stochastic Model of Human Computer Interaction for Learning Dialog Strategies , 1997 .