Data collection and performance evaluation of spoken dialogue systems: the MIT experience

In this paper we report our efforts in data collection and performance evaluation in support of spoken dialogue system development. We describe two understanding metrics called query densityandconcept efficiency which can be interpreted on a perutterance basis, but which are measured over the course of a dialogue. We also describe the evaluation infrastructure we have developed to support off-line data processing using our GALAXY client-server architecture [8]. We show how we have used these metrics and mechanisms as part of the development of a spoken dialogue system for air-travel information.

[1]  James Glass,et al.  Evaluation methodology for a telephone-based conversational system , 1998 .

[2]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[3]  Victor Zue,et al.  Multilingual spoken-language understanding in the MIT Voyager system , 1995, Speech Commun..

[4]  Joseph Polifroni,et al.  Galaxy-II as an Architecture for Spoken Dialogue Evaluation , 2000, LREC.

[5]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[6]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[7]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[8]  James R. Glass,et al.  Real-time telephone-based speech recognition in the Jupiter domain , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  James R. Glass,et al.  Lexical modeling of non-native speech for automatic speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Joseph Polifroni,et al.  Integrating recognition confidence scoring with language understanding and dialogue modeling , 2000, INTERSPEECH.