Will my Spoken Dialogue System be a Slow Learner ?

This paper presents a practical methodology for the integration of reinforcement learning during the design of a Spoken Dialogue System (SDS). It proposes a method that enables SDS designers to know, in advance, the number of dialogues that their system will need in order to learn the value of each state-action couple. We ask the designer to provide a user model in a simple way. Then, we run simulations with this model and we compute confidence intervals for the mean of the expected return of the state-action couples.

[1]  Fan Yang,et al.  Exploring initiative strategies using computer simulation , 2007, INTERSPEECH.

[2]  Romain Laroche,et al.  Hybridisation of expertise and reinforcement learning in dialogue systems , 2009, INTERSPEECH.

[3]  Milica Gasic,et al.  On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  Romain Laroche,et al.  Enhanced Monitoring Tools and Online Dialogue Optimisation Merged into a New Spoken Dialogue System Design Experience , 2010, SIGDIAL Conference.

[5]  Matthieu Geist,et al.  Uncertainty Management for On-Line Optimisation of a POMDP-Based Large-Scale Spoken Dialogue System , 2011, INTERSPEECH.

[6]  Matthieu Geist,et al.  User Simulation in Dialogue Systems Using Inverse Reinforcement Learning , 2011, INTERSPEECH.

[7]  Kallirroi Georgila,et al.  User simulation for spoken dialogue systems: learning and evaluation , 2006, INTERSPEECH.

[8]  Steve Young,et al.  Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[9]  Maxine Eskénazi,et al.  LET's GO: improving spoken dialog systems for the elderly and non-natives , 2003, INTERSPEECH.

[10]  Roberto Pieraccini,et al.  User modeling for spoken dialogue system evaluation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[11]  Joel R. Tetreault,et al.  Estimating the Reliability of MDP Policies: a Confidence Interval Approach , 2007, HLT-NAACL.