Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies

This paper deals with search algorithm for real-time speech recognition. We argue that software-only speech recognition has several critical advantages over using special or parallel hardware. We present a history of several advances in search algorithms, which together, have made it possible to implement real-time recognition of large vocabularies on a single workstation without the need for any hardware accelerators. We discuss the Forward-Backward Search algorithm in detail, as this is the key algorithm that has made possible recognition of very large vocabularies in real-time. The result is that we can recognize continuous speech with a vocabulary of 20,000 words strictly in real-time entirely in software on a high-end workstation with large memory. We demonstrate that the computation needed grows as the cube root of the vocabulary size.

[1]  Richard M. Schwartz,et al.  The N-Best Algorithm: Efficient Procedure for Finding Top N Sentence Hypotheses , 1989, HLT.

[2]  John Makhoul,et al.  Context-dependent modeling for acoustic-phonetic recognition of continuous speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Richard M. Schwartz,et al.  Efficient, High-Performance Algorithms for N-Best Search , 1990, HLT.

[4]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[5]  Dimitri Kanevsky,et al.  Constructing groups of acoustically confusable words , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Mei-Yuh Hwang,et al.  An improved search algorithm using incremental knowledge for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[8]  José Bernardo Mariño Acebal,et al.  Generation of multiple hypothesis in connected phonetic-unit recognition by a modified one-stage dynamic programming algorithm , 1989 .

[9]  Mei-Yuh Hwang,et al.  An Overview of the SPHINX-II Speech Recognition System , 1993, HLT.

[10]  Volker Steinbiss,et al.  Sentence-hypotheses generation in a continuous-speech recognition system , 1989, EUROSPEECH.

[11]  Mitch Weintraub,et al.  Progressive-Search Algorithms For Large-Vocabulary Speech Recognition , 1993, HLT.

[12]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.