Architecture and Search Organization for Large Vocabulary Continuous Speech Recognition

This paper gives an overview of an architecture and search organization for large vocabulary, continuous speech recognition (LVCSR at RWTH). In the first part of the paper, we describe the principle and architecture of a LVCSR system. In particular, the isssues of modeling and search for phoneme based recognition are discussed. In the second part, we review the word conditioned lexical tree search algorithm from the viewpoint of how the search space is organized. Further, we extend this method to produce high quality word graphs. Finally, we present some recognition results on the ARPA North American Business (NAB’94) task for a 64 000-word vocabulary (American English, continuous speech, speaker independent).

[1]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[2]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[3]  R. Schwartz,et al.  A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mei-Yuh Hwang,et al.  Improvements on the pronunciation prefix tree search organization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Hermann Ney,et al.  Data driven search organization for continuous speech recognition , 1992, IEEE Trans. Signal Process..

[6]  Hermann Ney,et al.  Search Strategies For Large-Vocabulary Continuous-Speech Recognition , 1995 .

[7]  Hermann Ney,et al.  Improvements in beam search for 10000-word continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[8]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  James K. Baker,et al.  Stochastic modeling for automatic speech understanding , 1990 .

[11]  Hermann Ney,et al.  A comparison of time conditioned and word conditioned search techniques for large vocabulary speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[13]  Antonio José Rubio Ayuso,et al.  Speech Recognition and Coding: New Advances and Trends , 1995 .

[14]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[15]  Andreas Noll,et al.  A data-driven organization of the dynamic programming beam search for continuous speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[17]  Hermann Ney,et al.  Implementation Of Word Based Statistical Language Models , 1997 .