Evaluating Spoken Language Systems

Spoken language systems (SLSs) for accessing information s ources or services through the telephone network and the Internet are currently being trialed and dep loyed for a variety of tasks. Evaluating the usability of different interface designs requires a method f r comparing performance of different versions of the SLS. Recently, Walker et al (1997) proposed PARA DISE (PARAdigm for DIalogue System Evaluation) as a general methodology for evaluating SLSs. T he PARADISE framework models user satisfaction with an SLS as a linear combination of measures reflecting both task success and dialogue costs. As a test of this methodology, we applied PARADISE to d ialogues collected with three SLSs. This paper describes the salient measures identified using P ARADISE within and across the three SLSs, and discusses the generalizability of PARADISE performanc e models.

[1]  David S Pallet Performance assessment of automatic speech recognizers , 1985 .

[2]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[3]  Lewis M. Norton,et al.  Beyond Class A: A Proposal for Automatic Evaluation of Discourse , 1990, HLT.

[4]  Elizabeth Shriberg,et al.  Human-Machine Problem Solving Using Spoken Language Systems (SLS): Factors Affecting Performance and User Satisfaction , 1992, HLT.

[5]  Alexander I. Rudnicky,et al.  Multi-Site Data Collection and Evaluation in Spoken Language Understanding , 1993, HLT.

[6]  Andrew C. Simpson,et al.  Black box and glass box evaluation of the SUNDIAL system , 1993, EUROSPEECH.

[7]  E. Levin,et al.  CHRONUS, The next generation , 1995 .

[8]  C Kamm,et al.  User Interfaces for voice applications , 1994 .

[9]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[10]  Philippe Bretier,et al.  Effective human-computer cooperative spoken dialogue: the AGS demonstrator , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[12]  Morena Danieli,et al.  Metrics for Evaluating Dialogue Strategies in a Spoken Language System , 1996, ArXiv.

[13]  Niels Ole Bernsen,et al.  Principles for the design of cooperative spoken human-machine dialogue , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  Victor Zue,et al.  WHEELS: a conversational system in the automobile classifieds domain , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Lori Lamel,et al.  Dialog in the RAILTEL telephone-based system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[16]  Roberto Billi,et al.  Field trial evaluations of two different information inquiry systems , 1997, Speech Commun..

[17]  Marilyn A. Walker,et al.  Evaluating competing agent strategies for a voice email agent , 1997, EUROSPEECH.

[18]  E. Russell Ritenour,et al.  Evaluating spoken dialog systems for telecommunication services , 1997, EUROSPEECH.

[19]  Marilyn A. Walker,et al.  From novice to expert: the effect of tutorials on user expertise with spoken dialogue systems , 1998, ICSLP.

[20]  Marilyn A. Walker,et al.  Evaluating Response Strategies in a Web-Based Spoken Dialogue Agent , 1998, ACL.

[21]  Shimei Pan,et al.  Empirically Evaluating an Adaptable Spoken Dialogue System , 1999, ArXiv.