From within-word model search to across-word model search in large vocabulary continuous speech recognition

In this paper we report on the application of across-word context dependent acoustic phoneme models in a single-pass large vocabulary continuous speech recognizer.Although across-word models are used by many groups today, only an outline of the recognizers is usually given in the publications. Implementation details are often missing.We present both a formal derivation of across-word model search and a detailed description of our implementation. The across-word model system is compared with a conventional within-word model system regarding word error rate and computational effort. Compared to the baseline within-word system a straightforward implementation of across-word model search results in a substantial increase of the computational effort. Therefore, several optimization steps are studied that result in a more efficient organization of the search space and a more efficient pruning. The effects of these optimizations are analysed in a detailed profiling. In combination they accelerate the straightforward implementation of across-word model search by nearly a factor of three.In addition we discuss the construction of word graphs during across-word model search. Starting from a word graph method based on within-word model search, we derive a formal specification of across-word word graphs. We show that the resulting word graphs are a good representation of the active search space.

[1]  Mei-Yuh Hwang,et al.  Applying SPHINX-II to the DARPA Wall Street Journal CSR Task , 1992, HLT.

[2]  Douglas B. Paul An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model , 1992, HLT.

[3]  Wu Chou,et al.  A unified approach of incorporating general features in decision tree based acoustic modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Mei-Yuh Hwang,et al.  An improved search algorithm using incremental knowledge for continuous speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[6]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[7]  Patrick Wambacq,et al.  An efficient search space representation for large vocabulary continuous speech recognition , 2000, Speech Commun..

[8]  Stefan Ortmanns,et al.  High quality word graphs using forward-backward pruning , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  Stefan Ortmanns,et al.  Dynamic programming search techniques for across-word modelling in speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.