REINFORCEMENT LEARNING WITH SIMULATED USER FOR AUTOMATIC DIALOG STRATEGY OPTIMIZATION

In this paper, we propose a solution to the problem of formulating strategies for a spoken dialog system. Our approach is based on reinforcement learning with the help of a simulated user in order to identify an optimal dialog strategy. Our method considers the Markov decision process to be a framework for representation of speech dialog in which the states represent history and discourse context, the actions are dialog acts and the transition strategies are decisions on actions to take between states. We present our reinforcement learning architecture with a novel objective function that is based on dialog quality rather than its duration.