论文信息 - User Simulation for Spoken Dialog System Development

User Simulation for Spoken Dialog System Development

A user simulation is a computer program which simulates human user behaviors. Recently, user simulations have been widely used in two spoken dialog system development tasks. One is to generate large simulated corpora for applying machine learning to learn new dialog strategies, and the other is to replace human users to test dialog system performance. Although previous studies have shown successful examples of applying user simulations in both tasks, it is not clear what type of user simulation is most appropriate for a specific task because few studies compare different user simulations in the same experimental setting. In this research, we investigate how to construct user simulations in a specific task for spoken dialog system development. Since most current user simulations generate user actions based on probabilistic models, we identify two main factors in constructing such user simulations: the choice of user simulation model and the approach to set up user action probabilities. We build different user simulation models which differ in their efforts in simulating realistic user behaviors and exploring more user actions. We also investigate different manual and trained approaches to set up user action probabilities. We introduce both task-dependent and task-independent measures to compare these simulations. We show that a simulated user which mimics realistic user behaviors is not always necessary for the dialog strategy learning task. For the dialog system testing task, a user simulation which simulates user behaviors in a statistical way can generate both objective and subjective measures of dialog system performance similar to human users. Our research examines the strengths and weaknesses of user simulations in spoken dialog system development. Although our results are constrained to our task domain and the resources available, we provide a general framework for comparing user simulations in a task-dependent context. In addition, we summarize and validate a set of evaluation measures that can be used in comparing different simulated users as well as simulated versus human users.

Hua Ai | H. Ai

[1] Oliver Lemon,et al. A Corpus Collection and Annotation Framework for Learning Multimodal Clarification Strategies , 2005, SIGDIAL.

[2] Marilyn A. Walker,et al. SPoT: A Trainable Sentence Planner , 2001, NAACL.

[3] George Karypis,et al. CLUTO - A Clustering Toolkit , 2002 .

[4] Kenneth R. Koedinger,et al. Building Cognitive Tutors with Programming by Demonstration , 2005 .

[5] Anton Leuski,et al. Radiobot-CFF: a spoken dialogue system for military training , 2006, INTERSPEECH.

[6] Markku Turunen,et al. Evaluation of a spoken dialogue system with usability tests and long-term pilot studies: similarities and differences , 2006, INTERSPEECH.

[7] Diane J. Litman,et al. Responding to Student Uncertainty During Computer Tutoring: An Experimental Evaluation , 2008, Intelligent Tutoring Systems.

[8] Mary Ellen Foster. Automated Metrics That Agree With Human Judgements On Generated Output for an Embodied Conversational Agent , 2008, INLG.

[9] Shimei Pan,et al. Designing and Evaluating an Adaptive Spoken Dialogue System , 2002, User Modeling and User-Adapted Interaction.

[10] Kallirroi Georgila,et al. Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data , 2005 .

[11] Tetsuya Ogata,et al. Dynamic help generation by estimating user²s mental model in spoken dialogue systems , 2006, INTERSPEECH.