An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model

The stack decoder is an attractive algorithm for controlling the acoustic and language model matching in a continuous speech recognizer. A previous paper described a near-optimal admissible Viterbi A* search algorithm for use with noncross-word acoustic models and no-grammar language models [16]. This paper extends this algorithm to include unigram language models and describes a modified version of the algorithm which includes the full (forward) decoder, cross-word acoustic models and longer-span language models. The resultant algorithm is not admissible, but has been demonstrated to have a low probability of search error and to be very efficient.

[1]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[3]  James Glass,et al.  Integration of speech recognition and natural language processing in the MIT VOYAGER system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[5]  W. W. Bledsoe,et al.  Review of "Problem-Solving Methods in Artificial Intelligence by Nils J. Nilsson", McGraw-Hill Pub. , 1971, SGAR.

[6]  Volker Steinbiss,et al.  Sentence-hypotheses generation in a continuous-speech recognition system , 1989, EUROSPEECH.

[7]  Douglas B. Paul The Lincoln tied-mixture HMM continuous speech recognizer , 1990 .

[8]  Robert Roth,et al.  A Rapid Match Algorithm for Continuous Speech Recognition , 1990, HLT.

[9]  D. O'Shaughnessy,et al.  A*-admissible heuristics for rapid lexical access , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[11]  Douglas B. Paul New Results with the Lincoln Tied-Mixture HMM CSR System , 1991, HLT.

[12]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[13]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[14]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Lalit R. Bahl,et al.  A fast approximate acoustic match for large vocabulary speech recognition , 1989, IEEE Trans. Speech Audio Process..

[16]  Douglas B. Paul,et al.  Speech Recognition Using Hidden Markov Models , 1990 .

[17]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[18]  Dimitri Kanevsky,et al.  Matrix fast match: a fast method for identifying a short list of candidate words for decoding , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[19]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[20]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[21]  Douglas B. Paul,et al.  Algorithms for an Optimal A* Search and Linearizing the Search in the Stack Decoder* , 1991, HLT.

[22]  Douglas B. Paul A CSR-NL Interface Specification Version 1.51 , 1989, HLT.