Acoustical and lexical based confidence measures for a very large vocabulary telephone speech hypothesis-verification system

In the context of large vocabulary speech recognition system, it’s of major interest to classify every utterance as being correctly or incorrectly recognised. In this paper we are presenting a preliminary study on a wordlevel confidence estimation system based on the output of a neural network. We use a combination of multiple features extracted from the acoustical and lexical decoders of our reference system, those available in the hypothesis stage of a hypothesis-verification very large vocabulary telephone speech recognition system. We will show the system architecture, describe the experiments leading to the selection of the set of parameters to be used by the NN and the final performance, showing promising results as compared with the use of standard log-likelihood ratio techniques for confidence scoring.

[1]  Andreas Wendemuth,et al.  Combination of confidence measures in isolated word recognition , 1998, ICSLP.

[2]  Timothy J. Hazen,et al.  Word and phone level acoustic confidence scoring , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Gethin Williams,et al.  A Study of the Use and Evaluation of Confidence Measures in Automatic Speech Recognition , 1998 .

[4]  Lin Lawrence Chase,et al.  Word and acoustic confidence annotation for large vocabulary speech recognition , 1997, EUROSPEECH.

[5]  Alex Acero,et al.  The VESTEL telephone speech database , 1994, ICSLP.

[6]  Wayne H. Ward,et al.  A senone based confidence measure for speech recognition , 1997, EUROSPEECH.

[7]  Wayne H. Ward,et al.  Confidence measures for dialogue management in the CU Communicator system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Mitch Weintraub,et al.  Neural-network based measures of confidence for word recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Javier Macías Guarasa,et al.  Initial evaluation of a preselection module for a flexible large vocabulary speech recognition system in telephone environment , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.