Progressive early decision of speech recognition results by comparing most likely word sequences

The most likely word sequence determined at the end of an utterance constitutes an optimal recognition result in continuous speech recognition for the entire utterance. However, depending on the application, the delay from the utterance to the determination of the recognition result may pose a practical problem, and progressive early decision of recognition results during an utterance becomes necessary. Although in the case of a one-pass search algorithm, progressive early decision of the recognition result by detecting past sole paths during search is possible, an effective early decision scheme is not available for the case of multiple passes. Thus, a scheme for progressive early decision of recognition results by successively comparing the most likely word sequences during an utterance with the past most likely word sequences is proposed and is applied to a one-pass decoder and a two-pass decoder. The proposed scheme attempts to shorten the delays associated with word decisions while limiting the degradation of the recognition rate by controlling the word decision margin and the interval for obtaining the most likely word sequences. In speech recognition experiments of broadcast news, the proposed scheme achieved an average word decision delay equal to that of the past sole path detection method in a one-pass decoder without significantly degrading the word recognition accuracy, and was able to progressively decide recognition results with an average word decision delay time of about 0.5 second in a two-pass decoder. © 2003 Wiley Periodicals, Inc. Syst Comp Jpn, 34(14): 73–82, 2003; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.10193

[1]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[2]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Long Nguyen,et al.  Multiple-Pass Search Strategies , 1996 .

[4]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Lalit R. Bahl,et al.  A tree search strategy for large-vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  James C. Spohrer,et al.  Partial traceback and dynamic programming , 1982, ICASSP.

[7]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[8]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.