N‐best breadth search for large vocabulary continuous speech recognition using a long span language model

In a large vocabulary continuous speech recognition system, high‐level linguistic knowledge can enhance performance. However, integration of high‐level linguistic knowledge and complex acoustic models under an efficient search scheme is still problematic. Higher‐order n‐grams are so computationally expensive, especially when the size of vocabulary is large, that real time processing is not possible yet. In this report, the n‐best breadth search algorithm is proposed under the framework of the state space search, which can handle higher order n‐grams and complex subword acoustic models such as the cross‐word triphones. The n‐best breadth search is a combination of the best first search and the breadth first search. The proposed algorithm can be extended to handle other types of language models such as the stochastic context‐free grammar, and different types of acoustic models including the neural networks. Compared with the conventional beam‐search method, this pilot experiment shows that the proposed algo...

[1]  Mitch Weintraub,et al.  Large-vocabulary dictation using SRI's DECIPHER speech recognition system: progressive search techniques , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[3]  Michael Picheny,et al.  A fast match for continuous speech recognition using allophonic models , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Giuliano Antoniol,et al.  Language model representations for beam-search decoding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Lalit R. Bahl,et al.  A tree search strategy for large-vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[7]  George Zavaliagkos,et al.  Is N-Best Dead? , 1994, HLT.

[8]  Chin-Hui Lee,et al.  A frame-synchronous network search algorithm for connected word recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[10]  Hermann Ney,et al.  Word graphs: an efficient interface between continuous-speech recognition and language understanding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  Douglas B. Paul,et al.  The Lincoln large-vocabulary stack-decoder HMM CSR , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.