A probabilistic approach to confidence estimation and evaluation

In this paper we propose a novel way of estimating confidences for words that are recognized by a speech recognition system, together with a natural methodology for evaluating the overall quality of those confidence estimates. Our approach is based on an interpretation of a confidence as the probability that the corresponding recognized word is correct, and makes use of generalized linear models as a means for combining various predictor scores so as to arrive at confidence estimates. Experimental results using these models are presented based on four different sources of speech data: switchboard, Spanish and Mandarin CallHome, and Wall Street Journal.

[1]  Herbert Gish,et al.  Understanding and improving speech recognition performance through the use of diagnostic tools , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Larry Gillick,et al.  Automatic language identification using large vocabulary continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3]  John J. Godfrey Multilingual Speech Databases at LDC , 1994, HLT.

[4]  Larry Gillick,et al.  Progress in recognizing conversational telephone speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Sheryl R. Young,et al.  Detecting misrecognitions and out-of-vocabulary words , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Stephen J. Cox,et al.  Confidence measures for the SWITCHBOARD database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.