论文信息 - Evaluating Spoken Language Systems

Evaluating Spoken Language Systems

Spoken language systems (SLSs) for accessing information s ources or services through the telephone network and the Internet are currently being trialed and dep loyed for a variety of tasks. Evaluating the usability of different interface designs requires a method f r comparing performance of different versions of the SLS. Recently, Walker et al (1997) proposed PARA DISE (PARAdigm for DIalogue System Evaluation) as a general methodology for evaluating SLSs. T he PARADISE framework models user satisfaction with an SLS as a linear combination of measures reflecting both task success and dialogue costs. As a test of this methodology, we applied PARADISE to d ialogues collected with three SLSs. This paper describes the salient measures identified using P ARADISE within and across the three SLSs, and discusses the generalizability of PARADISE performanc e models.

D. Litman | M. Walker | C. Kamm

[1] David S Pallet. Performance assessment of automatic speech recognizers , 1985 .

[2] Raj Reddy,et al. Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[3] Lewis M. Norton,et al. Beyond Class A: A Proposal for Automatic Evaluation of Discourse , 1990, HLT.

[4] Elizabeth Shriberg,et al. Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[5] Alexander I. Rudnicky,et al. Multi-Site Data Collection and Evaluation in Spoken Language Understanding , 1993, HLT.

[6] Andrew C. Simpson,et al. Black box and glass box evaluation of the SUNDIAL system , 1993, EUROSPEECH.

[7] E. Levin,et al. CHRONUS, The next generation , 1995 .

[8] C Kamm,et al. User Interfaces for voice applications , 1994 .

[9] Gina-Anne Levow,et al. Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[10] Philippe Bretier,et al. Effective human-computer cooperative spoken dialogue: the AGS demonstrator , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11] Margaret King,et al. Evaluating natural language processing systems , 1996, CACM.

[12] Morena Danieli,et al. Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[13] Niels Ole Bernsen,et al. Principles for the design of cooperative spoken human-machine dialogue , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14] Victor Zue,et al. WHEELS: a conversational system in the automobile classifieds domain , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15] Lori Lamel,et al. Dialog in the RAILTEL telephone-based system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16] Roberto Billi,et al. Field trial evaluations of two different information inquiry systems , 1997, Speech Commun..

[17] Marilyn A. Walker,et al. Evaluating competing agent strategies for a voice email agent , 1997, EUROSPEECH.

[18] E. Russell Ritenour,et al. Evaluating spoken dialog systems for telecommunication services , 1997, EUROSPEECH.

[19] Marilyn A. Walker,et al. From novice to expert: the effect of tutorials on user expertise with spoken dialogue systems , 1998, ICSLP.

[20] Marilyn A. Walker,et al. Evaluating Response Strategies in a Web-Based Spoken Dialogue Agent , 1998, ACL.

[21] Shimei Pan,et al. Empirically Evaluating an Adaptable Spoken Dialogue System , 1999, ArXiv.