Hierarchical Dialogue Policy Learning using Flexible State Transitions and Linear Function Approximation

Conversational agents that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem ca nb e addressed either by using function approximation techniques that estimate an approximate tru ev alue function, or by using ah ierarchical decomposition of a learning task into subtasks. In this paper, we present a novel approach for dialogue policy optimization that combines the benefits of hierarchical control with function approximation. The approach incorporates two concepts to allow flexible switching between subdialogues, extending current hierarchical reinforcement learning methods. First, hierarchical treebased state representations initially represent a compact portion of the possible state space and are then dynamically extended in real time. Second, we allow state transitions across sub-dialogues to allow non-strict hierarchical control. Our approach is integrated, and tested with real users, in a robot dialogue system that learns to play Quiz games.

[1]  Nina Dethlefs,et al.  Optimising Natural Language Generation Decision Making For Situated Dialogue , 2011, SIGDIAL Conference.

[2]  Nina Dethlefs,et al.  Spatially-aware dialogue control using hierarchical reinforcement learning , 2011, TSLP.

[3]  Peter A. Heeman Combining Reinformation Learning with Information-State Update Rules , 2007, HLT-NAACL.

[4]  Nina Dethlefs,et al.  Generating Adaptive Route Instructions Using Hierarchical Reinforcement Learning , 2010, Spatial Cognition.

[5]  Oliver Lemon,et al.  Evaluation of a hierarchical reinforcement learning spoken dialogue system , 2010, Comput. Speech Lang..

[6]  KearnsMichael,et al.  Optimizing dialogue management with reinforcement learning , 2002 .

[7]  Kallirroi Georgila,et al.  Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[8]  Jean-Christophe Baillie,et al.  URBI: towards a universal robotic low-level programming language , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Fabio Tesser,et al.  A conversational system for multi-session child-robot interaction with several games , 2011 .

[10]  Steve J. Young,et al.  Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs , 2011, TSLP.

[11]  Nina Dethlefs,et al.  Hierarchical Reinforcement Learning and Hidden Markov Models for Task-Oriented Natural Language Generation , 2011, ACL.

[12]  Oliver Lemon,et al.  Hierarchical dialogue optimization using semi-Markov decision processes , 2007, INTERSPEECH.

[13]  Olivier Pietquin,et al.  Batch Reinforcement Learning for Spoken Dialogue Systems with Sparse Value Function Approximation , 2010 .

[14]  Ivana Kruijff-Korbayová,et al.  An Interactive Humanoid Robot Exhibiting Flexible Sub-Dialogues , 2012, HLT-NAACL.

[15]  Oliver Lemon,et al.  Towards optimising modality allocation for multimodal output generation in incremental dialogue , 2012 .

[16]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[17]  Gabriel Skantze,et al.  A General, Abstract Model of Incremental Dialogue Processing , 2009, EACL.

[18]  Aryel Beck,et al.  Towards an Affect Space for robots to display emotional body language , 2010, 19th International Symposium in Robot and Human Interactive Communication.

[19]  Fabio Tesser,et al.  Spoken language processing in a conversational system for child-robot interaction , 2012, WOCCI.

[20]  Roger K. Moore Computer Speech and Language , 1986 .

[21]  MahadevanSridhar,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003 .

[22]  Oliver Lemon,et al.  Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation , 2011, Comput. Speech Lang..

[23]  Marilyn A. Walker,et al.  Automatic Optimization of Dialogue Management , 2000, COLING.

[24]  Nina Dethlefs,et al.  Hierarchical Multiagent Reinforcement Learning for Coordinating Verbal and Non-verbal Actions in Robots , 2012 .

[25]  Heriberto Cuayáhuitl,et al.  Hierarchical Reinforcement Learning for Spoken Dialogue Systems , 2009 .

[26]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[27]  Helen F. Hastie,et al.  Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems , 2012, EMNLP-CoNLL.

[28]  Lihong Li,et al.  Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection , 2009, INTERSPEECH.

[29]  Nina Dethlefs,et al.  Optimizing Situated Dialogue Management in Unknown Environments , 2011, INTERSPEECH.

[30]  Nina Dethlefs,et al.  Combining Hierarchical Reinforcement Learning and Bayesian Networks for Natural Language Generation in Situated Dialogue , 2011, ENLG.

[31]  Thomas G. Dietterich An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.

[32]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[33]  Peter A. Heeman Combining Reinforcement Learning with Information-State Update Rules ∗ , 2007 .

[34]  Crystal Chao,et al.  Timing in multimodal turn-taking interactions , 2012, HRI 2012.

[35]  Oliver Lemon,et al.  Integrating Location, Visibility, and Question-Answering in a Spoken Dialogue System for Pedestrian City Exploration , 2012, SIGDIAL Conference.

[36]  Jason D. Williams,et al.  The best of both worlds: unifying conventional dialog systems and POMDPs , 2008, INTERSPEECH.