Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data

We propose a method for learning dialogue management policies from a fixed dataset. The method is designed for use with “Information State Update” (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a very large policy space. To address the problem that any fixed dataset will only provide information about small portions of these state and policy spaces, we propose a hybrid model which combines reinforcement learning (RL) with supervised learning. The reinforcement learning is used to optimise a measure of dialogue reward, while the supervised learning is used to restrict the learnt policy to the portion of the space for which we have data. Linear function approximation is used to handle the large state space efficiently. We trained this model on a subset of the COMMUNICATOR corpus, to which we have added annotations for user actions and Information States. When tested with a user simulation trained on the same data, our model outperforms all the systems in the COMMUNICATOR data (it scores 37% higher than the best COMMUNICATOR system). All of these advances will improve techniques for bootstrapping and automatic optimisation of dialogue management policies from limited initial datasets.

[1]  Roberto Pieraccini,et al.  A stochastic model of computer-human interaction for learning dialogue strategies , 1997, EUROSPEECH.

[2]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[3]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[4]  Marilyn A. Walker,et al.  DATE: A Dialogue Act Tagging Scheme for Evaluation of Spoken Dialogue Systems , 2001, HLT.

[5]  Oliver Lemon,et al.  DIPPER: Description and Formalisation of an Information-State Update Dialogue System Architecture , 2003, SIGDIAL Workshop.

[6]  Steve Young,et al.  Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[7]  Marilyn A. Walker,et al.  Quantitative and Qualitative Evaluation of Darpa Communicator Spoken Dialogue Systems , 2001, ACL.

[8]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[9]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[10]  A. Crespo,et al.  Natural Language Engineering L a T E X Supplement , 1999 .

[11]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[12]  Kallirroi Georgila,et al.  Automatic annotation of COMMUNICATOR dialogue data for learning dialogue strategies and user simulations , 2005 .

[13]  Adam Cheyer,et al.  The Open Agent Architecture , 1997, Autonomous Agents and Multi-Agent Systems.

[14]  Gregory A. Sanders,et al.  DARPA communicator dialog travel planning systems: the june 2000 data collection , 2001, INTERSPEECH.

[15]  Gregory A. Sanders,et al.  DARPA communicator: cross-system results for the 2001 evaluation , 2002, INTERSPEECH.

[16]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.