The RWTH speech recognition system and spoken document retrieval

We present an overview of the RWTH Aachen large vocabulary continuous speech recognizer. The recognizer is based on continuous density hidden Markov models and a time-synchronous left-to-right beam search strategy. Experimental results on the ARPA Wall Street Journal (WSJ) corpus verify the effects of several system components, namely linear discriminant analysis, vocal tract normalization, pronunciation lexicon and cross-word triphones, on the recognition performance. Finally, the extension of the recognition system towards spoken document retrieval is discussed.

[1]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[3]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[4]  Hermann Ney,et al.  Improvements in beam search , 1994, ICSLP.

[5]  N. Sedgwick,et al.  A method for segmenting acoustic patterns, with applications to automatic speech recognition , 1977 .

[6]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[7]  Michael J. Carey,et al.  Improved topic spotting through statistical modelling of keyword dependencies , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hermann Ney,et al.  Fast likelihood computation methods for continuous mixture densities in large vocabulary speech recognition , 1997, EUROSPEECH.

[9]  Wolfgang Wahlster,et al.  Verbmobil: the combination of deep and shallow processing for spontaneous speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Hermann Ney,et al.  State tying for context dependent phoneme models , 1997, EUROSPEECH.

[11]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Steve J. Young,et al.  The use of state tying in continuous speech recognition , 1993, EUROSPEECH.

[14]  Hermann Ney,et al.  Large vocabulary continuous speech recognition using word graphs , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[15]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[16]  David A. James,et al.  A system for unrestricted topic retrieval from radio news broadcasts , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Hermann Ney,et al.  Implementation Of Word Based Statistical Language Models , 1997 .

[18]  Gerald Salton,et al.  Automatic text processing , 1988 .

[19]  Hermann Ney,et al.  Acoustic front-end optimization for large vocabulary speech recognition , 1997, EUROSPEECH.

[20]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.