Subjective evaluation of spoken dialogue systems using SER VQUAL method

There is demand for subjective metrics in spoken dialogue system evaluation. SERVQUAL is a service quality evaluation method developed by marketing academics. It produces a subjective measure of the gap between expectations and perceptions in five service quality dimensions common for all services. We present how the method was applied to spoken dialogue system evaluation. In order to improve the suitability of the original method, we modified the test questionnaire and the test process. We demonstrate how the modified method was successfully used in an evaluation of a telephone-based e-mail application. The evaluation gave us directions for further development of the system. In addition, we found some interesting phenomena, such as the variation between genders. We present how the method can be further improved, for example, by dividing the questionnaire into two parts.

[1]  A. Parasuraman,et al.  SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality. , 1988 .

[2]  K. Á. T.,et al.  Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.

[3]  Marilyn A. Walker,et al.  What can I say?: evaluating a spoken language interface to Email , 1998, CHI.

[4]  L. van Haaren,et al.  Evaluating Quality of Spoken Dialogue Systems: Comparing a Technology-focused and a User-focused Approach , 1998 .

[5]  Markku Turunen,et al.  Robust and adaptive architecture for multilingual spoken dialogue systems , 2004, INTERSPEECH.

[6]  A. Parasuraman,et al.  Alternative scales for measuring service quality: A comparative assessment based on psychometric and diagnostic criteria , 1994 .

[7]  A. Parasuraman,et al.  Refinement and reassessment of the SERVQUAL scale. , 1991 .

[8]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[9]  Mary Jo Bitner,et al.  Services Marketing: Integrating Customer Focus Across the Firm , 1996 .

[10]  Markku Turunen,et al.  Flexible dialogue management using distributed and dynamic dialogue control , 2004, INTERSPEECH.

[11]  Lars Bo Larsen,et al.  Assessment of spoken dialogue system usability - what are we really measuring? , 2003, INTERSPEECH.

[12]  Sebastian Möller,et al.  Quantifying the impact of system characteristics on perceived quality dimensions of a spoken dialogue service , 2003, INTERSPEECH.