A New ITU-T Recommendation on the Evaluation of Telephone-Based Spoken Dialogue Systems

This article describes efforts which have recently been undertaken by the International Telecommunication Union (ITU-T) to agree on common methods for evaluating telephone services based on spoken dialogue systems. As a result of these efforts, a new ITU-T Recommendation P.851 (2003) has been approved. It summarizes on the one hand the factors of the system, of the service and of the user which influence the service quality. On the other hand, guidelines are presented on how to evaluate services with the help of subjective interaction experiments, in order to determine the user’s quality perceptions. The relationships between influencing factors and perceived quality dimensions are displayed with the help of a taxonomy. This taxonomy puts different quality aspects into a logical relationship, and shows which factors have to be taken into account in the experimental set-up. The article discusses what has been reached in the new Recommendation, but also what is still missing in order to get more analytic information about the performance of system characteristics and their influence on overall service quality.

[1]  Niels Ole Bernsen,et al.  Designing Interactive Speech Systems , 1998, Springer London.

[2]  Sebastian Möller A new Taxonomy for the Quality of Telephone Services Based on Spoken Dialogue Systems , 2002, SIGDIAL Workshop.

[3]  Sebastian Mller,et al.  Quality of Telephone-Based Spoken Dialogue Systems , 2004 .

[4]  Maria Wolters,et al.  In Proc. European Conf. on Speech Communication and Technology , 1997 .

[5]  E. Russell Ritenour,et al.  Evaluating spoken dialog systems for telecommunication services , 1997, EUROSPEECH.

[6]  Sebastian Möller,et al.  Quality of Telephone-Based Spoken Dialogue Systems , 2005 .

[7]  Dafydd Gibbon,et al.  Assessment of interactive systems. , 1998 .

[8]  Niels Ole Bernsen,et al.  Designing interactive speech systems - from first ideas to user testing , 1998 .

[9]  Sebastian Möller,et al.  Quantifying the impact of system characteristics on perceived quality dimensions of a spoken dialogue service , 2003, INTERSPEECH.

[10]  Cristina Delogu,et al.  A methodology for evaluating human-machine spoken language interaction , 1993, EUROSPEECH.

[11]  Sebastian Möller,et al.  INSPIRE: Evaluation of a Smart-Home System for Infotainment Management and Device Control , 2004, LREC.

[12]  Elisabeth Maier,et al.  Dialogue Processing in Spoken Language Systems , 1996, Lecture Notes in Computer Science.

[13]  Jean-Luc Gauvain,et al.  User evaluation of the MASK kiosk , 1998, Speech Commun..

[14]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[15]  Siobhan Chapman Logic and Conversation , 2005 .

[16]  서정헌,et al.  반도체 공정 overview , 2001 .