An improved search algorithm using incremental knowledge for continuous speech recognition

A search algorithm that incrementally makes effective use of detailed sources of knowledge is proposed. The algorithm incrementally applies all available acoustic and linguistic information in three search phases. Phase one is a left-to-right Viterbi beam search that produces word end times and scores using right context between-word models with a bigram language model. Phase two, guided by results from phase one, is a right-to-left Viterbi beam search that produces word begin times and scores based on left context between-word models. Phase three is an A* search that combines the results of phases one and two with a long-distance language model. The objective is to maximize the recognition accuracy with a minimal increase in computational cost. With the decomposed, incremental, search algorithm, it is shown that early use of detailed acoustic models can significantly reduce the recognition error rate with a negligible increase in computational cost. It is demonstrated that the early use of detailed knowledge can improve the word error bound by at least 22% for large-vocabulary, speaker-independent, continuous speech recognition.<<ETX>>

[1]  Lalit R. Bahl,et al.  Obtaining candidate words by polling in a large vocabulary speech recognition system , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[2]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[3]  W. W. Bledsoe,et al.  Review of "Problem-Solving Methods in Artificial Intelligence by Nils J. Nilsson", McGraw-Hill Pub. , 1971, SGAR.

[4]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[5]  Victor Zue,et al.  Recent Progress on the VOYAGER System , 1990, HLT.

[6]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[7]  Douglas B. Paul An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model , 1992, HLT.

[8]  Mei-Yuh Hwang,et al.  Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..

[9]  X. D. Huang,et al.  Phoneme classification using semicontinuous hidden Markov models , 1992, IEEE Trans. Signal Process..

[10]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[11]  Mei-Yuh Hwang,et al.  The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[12]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Fileno A. Alleva Search Organization for Large Vocabulary Continuous Speech Recognition , 1992 .