Effects of word error rate in the DARPA communicator data during 2000 and 2001
暂无分享,去创建一个
During 2000 and 2001 two large data collections were performed, with paid users. We analyze the effects of speech recognition accuracy, as measured by Word Error Rate (WER), on other metrics. Analysis shows a linear correlation between WER and the Task Completion metrics, and (unexpectedly) this relationship remains more or less linear even for quite high values of WER. The picture for User Satisfaction metrics is more complex, and a linear model derived from the data by using the PARADISE framework [1] is given by Walker et al. [2]. We present evidence suggesting a somewhat linear relationship between WER and User Satisfaction for WER less than 35% or 40% in 2001, compared to stronger correlations in 2000. Finally, we note that the size of effect of increasing WER on Task Completion (slope of the least-squares regression line) appears to be about half as large in 2001 as in 2000, which we attribute to improved strategies for accomplishing tasks despite speech recognition errors. We consider this to be an important accomplishment of the research groups who built the Communicator implementations.
[1] Marilyn A. Walker,et al. Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.
[2] Gregory A. Sanders,et al. DARPA communicator dialog travel planning systems: the june 2000 data collection , 2001, INTERSPEECH.