With improved speech understanding technology, many successful working systems have been developed. However, the high degree of complexity and wide variety of design methodology make the performance evaluation and error analysis for such systems very difficult. The different metrics for individual modules such as the word accuracy, spotting rate, language model coverage and slot accuracy are very often helpful, but it is always difficult to select or tune each of the individual modules or determine which module contributed to how much percentage of understanding errors based on such metrics. A new framework for performance evaluation and error analysis for speech understanding systems is proposed based on the comparison with the 'best-matched' references obtained from the word graphs with the target words and tags given. In this framework, all test utterances can be classified based on the error types, and various understanding metrics can be obtained accordingly. Error analysis approaches based on an error plane are then proposed, with which the sources for understanding errors (e.g., poor acoustic recognition, language model, search error, etc.) can be identified for each utterance. Such a framework will be very helpful for design and analysis of speech understanding systems.
[1]
Lin-Shan Lee,et al.
A*-admissible key-phrase spotting with sub-syllable level utterance verification
,
1998,
ICSLP.
[2]
Lin-Shan Lee,et al.
Hierarchical tag-graph search for spontaneous speech understanding in spoken dialog systems
,
1998,
ICSLP.
[3]
Victor Zue,et al.
From interface to content: translingual access and delivery of on-line information
,
1997,
EUROSPEECH.
[4]
Lin Lawrence Chase.
Blame assignment for errors made by large vocabulary speech recognizers
,
1997,
EUROSPEECH.
[5]
Hermann Ney,et al.
Evaluating dialog systems used in the real world
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).