论文信息 - Hierarchical Dialogue Policy Learning using Flexible State Transitions and Linear Function Approximation

Hierarchical Dialogue Policy Learning using Flexible State Transitions and Linear Function Approximation

Conversational agents that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem ca nb e addressed either by using function approximation techniques that estimate an approximate tru ev alue function, or by using ah ierarchical decomposition of a learning task into subtasks. In this paper, we present a novel approach for dialogue policy optimization that combines the benefits of hierarchical control with function approximation. The approach incorporates two concepts to allow flexible switching between subdialogues, extending current hierarchical reinforcement learning methods. First, hierarchical treebased state representations initially represent a compact portion of the possible state space and are then dynamically extended in real time. Second, we allow state transitions across sub-dialogues to allow non-strict hierarchical control. Our approach is integrated, and tested with real users, in a robot dialogue system that learns to play Quiz games.

Nina Dethlefs | Ivana Kruijff-Korbayová | Heriberto Cuayáhuitl

[1] Nina Dethlefs,et al. Optimising Natural Language Generation Decision Making For Situated Dialogue , 2011, SIGDIAL Conference.

[2] Nina Dethlefs,et al. Spatially-aware dialogue control using hierarchical reinforcement learning , 2011, TSLP.

[3] Peter A. Heeman. Combining Reinformation Learning with Information-State Update Rules , 2007, HLT-NAACL.

[4] Nina Dethlefs,et al. Generating Adaptive Route Instructions Using Hierarchical Reinforcement Learning , 2010, Spatial Cognition.

[5] Oliver Lemon,et al. Evaluation of a hierarchical reinforcement learning spoken dialogue system , 2010, Comput. Speech Lang..

[6] KearnsMichael,et al. Optimizing dialogue management with reinforcement learning , 2002 .

[7] Kallirroi Georgila,et al. Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[8] Jean-Christophe Baillie,et al. URBI: towards a universal robotic low-level programming language , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9] Fabio Tesser,et al. A conversational system for multi-session child-robot interaction with several games , 2011 .

[10] Steve J. Young,et al. Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs , 2011, TSLP.

[11] Nina Dethlefs,et al. Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation , 2011, ACL.