Evaluation of word confidence for speech recognition systems

Abstract Confidence measures enable us to assess the output of a speech recognition system. The confidence measure provides us with an estimate of the probability that a word in the recognizer output is either correct or incorrect. In this paper we discuss ways in which to quantify the performance of confidence measures in terms of their discrimination power and bias. In particular, we analyze two different performance metrics: the classification equal error rate and the normalized mutual information metric. We then report experimental results of using these metrics to compare four different confidence measure estimation schemes. We also discuss the relationship between these metrics and the operating point of the speech recognition system and develop an approach to the robust estimation of normalized mutual information.

[1]  Thomas Schaaf,et al.  Confidence measures for spontaneous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Mari Ostendorf,et al.  Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[3]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[4]  Thomas Schaaf,et al.  Estimating confidence using word lattices , 1997, EUROSPEECH.

[5]  Peter Regel-Brietzmann,et al.  Word graph rescoring using confidence measures , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Larry Gillick,et al.  A probabilistic approach to confidence estimation and evaluation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Mitchel Weintraub,et al.  LVCSR log-likelihood ratio scoring for keyword spotting , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Chalapathy Neti,et al.  Word-based confidence measures as a guide for stack search in speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[10]  Peter Regel-Brietzmann,et al.  A low-cost phonetic transcription method , 1997, EUROSPEECH.

[11]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Wayne H. Ward,et al.  A senone based confidence measure for speech recognition , 1997, EUROSPEECH.

[13]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[14]  Kenney Ng,et al.  Robust mapping of noisy speech parameters for HMM word spotting , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Lin Lawrence Chase,et al.  Word and acoustic confidence annotation for large vocabulary speech recognition , 1997, EUROSPEECH.

[16]  Erica G. Bernstein,et al.  OOV utterance detection based on the recognizer response function , 1997, EUROSPEECH.

[17]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[18]  Herbert Gish,et al.  Improved estimation, evaluation and applications of confidence measures for speech recognition , 1997, EUROSPEECH.

[19]  Herbert Gish,et al.  The BBN Byblos 1997 large vocabulary conversational speech recognition system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  Herbert Gish,et al.  Phonetic-based word spotter: various configurations and application to event spotting , 1993, EUROSPEECH.