Four-layer categorization scheme of fast GMM computation techniques in large vocabulary continuous speech recognition systems

Large vocabulary continuous speech recognition systems are known to be computationally intensive. A major bottleneck is the Gaussian mixture model (GMM) computation and various techniques have been proposed to address this problem. We present a systematic study of fast GMM computation techniques. As there are a large number of these and it is impractical to exhaustively evaluate all of them, we first categorized techniques into four layers and selected representative ones to evaluate in each layer. Based on this framework of study, we provide a detailed analysis and comparison of GMM computation techniques from the four-layer perspective and explore two subtle practical issues, 1) how different techniques can be combined effectively and 2) how beam pruning will affect the performance of GMM computation techniques. All techniques are evaluated in the CMU Communicator domain. We also compare their performance with others reported in the literature.

[1]  Roberto Bisiani,et al.  Sub-vector clustering to improve memory and speed performance of acoustic likelihood computation , 1997, EUROSPEECH.

[2]  Alexander I. Rudnicky,et al.  The carnegie mellon communicator corpus , 2002, INTERSPEECH.

[3]  Kiyohiro Shikano,et al.  Gaussian mixture selection using context-independent HMM , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Hermann Ney,et al.  Look-ahead techniques for fast beam search , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Satoshi Takahashi,et al.  Four-level tied-structure for efficient representation of acoustic modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Mark J. F. Gales,et al.  Use of Gaussian selection in large vocabulary continuous speech recognition using HMMS , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  A Waibel Co-Advisor,et al.  Fast Speaker Independent Large Vocabulary Continuous Speech Recognition , 1998 .

[9]  Richard M. Stern,et al.  THE 1999 CMU 10X REAL TIME BROADCAST NEWS TRANSCRIPTION SYSTEM , 1999 .