Simultaneous feature selection and parameter optimization for training of dialog policy by reinforcement learning

This paper addresses the problem of feature selection in the reinforcement learning (RL) of the dialog policies of spoken dialog systems. A statistical dialog manager selects the system actions the system should take based on the features derived from the current dialog state and/or the system's belief state. When defining the features used by the system for training the dialog policy, however, finding a set of actually effective features from potentially useful ones is not obvious. In addition, the selection should be done simultaneously with the optimization of the dialog policy. In this paper, we propose an incremental feature selection method for the optimization of a dialog policy by RL, in which improvement of the dialog policy and the feature selection are conducted simultaneously. Experiments in dialog policy optimization by RL with a user simulator demonstrated the following: 1) that the proposed method can find a better dialog policy with fewer policy iterations and 2) the learning speed is comparable with the case where feature selection is conducted in advance.

[1]  Ryuichiro Higashinaka,et al.  Wizard of Oz evaluation of listening-oriented dialogue control using POMDP , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[2]  S. Young,et al.  Scaling POMDPs for Spoken Dialog Management , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Kallirroi Georgila,et al.  Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[4]  Satoshi Nakamura,et al.  Modeling spoken decision support dialogue and optimization of its dialogue strategy , 2011, TSLP.

[5]  Milica Gasic,et al.  Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers , 2010, SIGDIAL Conference.

[6]  Matthieu Geist,et al.  Sparse Approximate Dynamic Programming for Dialog Management , 2010, SIGDIAL Conference.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Satoshi Nakamura,et al.  Construction and Experiment of a Spoken Consulting Dialogue System , 2010, IWSDS.

[9]  Steve J. Young,et al.  Bayesian update of dialogue state for robust dialogue systems , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Oliver Lemon,et al.  Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation , 2011, Comput. Speech Lang..

[11]  Lihong Li,et al.  Reinforcement learning for dialog management using least-squares Policy iteration and fast feature selection , 2009, INTERSPEECH.

[12]  Mikio Nakano,et al.  Learning dialogue policies using state aggregation in reinforcement learning , 2004, INTERSPEECH.

[13]  Maxine Eskénazi,et al.  Let's go public! taking a spoken dialog system to the real world , 2005, INTERSPEECH.

[14]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[15]  Matthieu Geist,et al.  Off-policy learning in large-scale POMDP-based dialogue systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Milica Gasic,et al.  On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[17]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[18]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.