VLSI Architecture of GMM Processing and Viterbi Decoder for 60, 000-Word Real-Time Continuous Speech Recognition

We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74MHz) and 84.04% memory bandwidth reduction (549.91MB/s) for real-time 60-k word continuous speech recognition.