Response time as a metric for comparison of speech recognition by humans and machines

The performance of automatic speech recognition systems is usually assessed in terms of error rate. Human speech recognition produces few errors, but relative difficulty of processing can be assessed via response time techniques. We report the construction of a measure analogous to response time in a machine recognition system. This measure may be compared directly with human response times. We conducted a trial comparison of this type at the phoneme level, including both tense and lax vowels and a variety of consonant classes. The results suggested similarities between human and machine processing in the case of consonants, but differences in the case of vowels.

[1]  Anne Cutler,et al.  Phoneme identification and the lexicon , 1987, Cognitive Psychology.

[2]  Anne Cutler,et al.  Detection of vowels and consonants with minimal acoustic variation , 1992, Speech Commun..

[3]  M. D. Wang,et al.  Consonant confusions in noise: a study of perceptual features. , 1973, The Journal of the Acoustical Society of America.

[4]  Anne Cutler,et al.  Detection times for vowels versus consonants , 1991, EUROSPEECH.

[5]  Robert G Pachella,et al.  The Interpretation of Reaction Time in Information-Processing Research 1 , 1973, Human Information Processing.

[6]  Li Deng,et al.  Use of vowel duration information in a large vocabulary word recognizer , 1989 .

[7]  Anne Cutler,et al.  Vowels as phoneme detection targets , 1990, ICSLP.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Donald J. Foss,et al.  Identifying the speech codes , 1980, Cognitive Psychology.

[10]  Anne Cutler,et al.  Monitoring sentence comprehension , 1979 .

[11]  Andrew J. Viterbi,et al.  Principles of Digital Communication and Coding , 1979 .

[12]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[13]  Anne Cutler,et al.  Speeded detection of vowels and steady-state consonants , 1992, ICSLP.

[14]  A. Cutler Phoneme-monitoring reaction time as a function of preceding intonation contour , 1976 .

[15]  Frank Fallside,et al.  A recurrent error propagation network speech recognition system , 1991 .