Start-synchronous search for large vocabulary continuous speech recognition

In this paper, we present a novel, efficient search strategy for large vocabulary continuous speech recognition. The search algorithm, based on a stack decoder framework, utilizes phone-level posterior probability estimates (produced by a connectionist/hidden Markov model acoustic model) as a basis for phone deactivation pruning-a highly efficient method of reducing the required computation. The single-pass algorithm is naturally factored into the time-asynchronous processing of the word sequence and the time-synchronous processing of the hidden Markov model state sequence. This enables the search to be decoupled from the language model while still maintaining the computational benefits of time-synchronous processing. The incorporation of the language model in the search is discussed and computationally cheap approximations to the full language model are introduced. Experiments were performed on the North American Business News task using a 60000 word vocabulary and a trigram language model. Results indicate that the computational cost of the search may be reduced by more than a factor of 40 with a relative search error of less than 2% using the techniques discussed in the paper.

[1]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[2]  Hermann Ney,et al.  Improvements in beam search for 10000-word continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[3]  J. Foote,et al.  WSJCAM0: A BRITISH ENGLISH SPEECH CORPUS FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 1995 .

[4]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  Nils J. Nilsson,et al.  Problem-solving methods in artificial intelligence , 1971, McGraw-Hill computer science series.

[6]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[7]  D. O'Shaughnessy,et al.  A*-admissible heuristics for rapid lexical access , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hervé Bourlard,et al.  Estimation of global posteriors and forward-backward training of hybrid HMM/ANN systems , 1997, EUROSPEECH.

[9]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[10]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[11]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[13]  Hermann Ney,et al.  Large vocabulary continuous speech recognition of Wall Street Journal data , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[15]  Hermann Ney,et al.  Data driven search organization for continuous speech recognition , 1992, IEEE Trans. Signal Process..

[16]  Steve Renals,et al.  THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .

[17]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Long Nguyen,et al.  Multiple-Pass Search Strategies , 1996 .

[19]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[20]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[21]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[22]  Hervé Bourlard,et al.  Optimizing recognition and rejection performance in wordspotting systems , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Lalit R. Bahl,et al.  Design of a linguistic statistical decoder for the recognition of continuous speech , 1975, IEEE Trans. Inf. Theory.

[24]  S. Renals,et al.  Phone deactivation pruning in large vocabulary continuous speech recognition , 1996, IEEE Signal Processing Letters.

[25]  Teuvo Kohonen,et al.  Speech recognition: a hybrid approach , 1998 .

[26]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[27]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[28]  Lalit R. Bahl,et al.  A tree search strategy for large-vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[29]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[30]  A. Poritz,et al.  Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[31]  P.S. Gopalakrishnan,et al.  A channel-bank-based phone detection strategy , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Douglas B. Paul An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model , 1992, HLT.

[33]  Anthony J. Robinson,et al.  Context-Dependent Classes in a Hybrid Recurrent Network-HMM Speech Recognition System , 1995, NIPS.

[34]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[35]  P. Mermelstein,et al.  Fast search strategy in a large vocabulary word recognizer , 1988 .

[36]  Steve Renals,et al.  Recent improvements to the ABBOT large vocabulary CSR system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[37]  Lalit R. Bahl,et al.  A fast approximate acoustic match for large vocabulary speech recognition , 1989, IEEE Trans. Speech Audio Process..

[38]  T. K. Vintsyuk Element-wise recognition of continuous speech composed of words from a specified dictionary , 1971, CYBERNETICS.

[39]  Douglas B. Paul,et al.  An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model , 1992, HLT.

[40]  Christoph Neukirchen,et al.  Efficient search with posterior probability estimates in HMM-based speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).