PAPER Special Section on Circuits and Design Techniques for Advanced Large Scale Integration VLSI Architecture of GMM Processing and Viterbi Decoder for 60,000-Word Real-Time Continuous Speech Recognition ∗

SUMMARY We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.

[1]  Wonyong Sung,et al.  A Real-Time FPGA-Based 20 000-Word Speech Recognizer With Optimized DRAM Access , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  Rob A. Rutenbar,et al.  A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer , 2009, FPGA '09.

[3]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[4]  Rob A. Rutenbar,et al.  A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA , 2007, FPGA '07.

[5]  Wonyong Sung,et al.  FPGA-based implementation of a real-time 5000-word continuous speech recognizer , 2008, 2008 16th European Signal Processing Conference.

[6]  Tao Ma,et al.  Novel CI-backoff scheme for real-time embedded speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Masahiko Yoshimoto,et al.  Parallelized viterbi processor for 5, 000-word large-vocabulary real-time continuous speech recognition FPGA system , 2009, INTERSPEECH.

[8]  Kazuyoshi Takagi,et al.  A VLSI Architecture for Output Probability Computations of HMM-Based Recognition Systems with Store-Based Block Parallel Processing , 2010 .

[9]  Masahiko Yoshimoto,et al.  A low memory bandwidth Gaussian mixture model (GMM) processor for 20,000-word real-time speech recognition FPGA system , 2008, 2008 International Conference on Field-Programmable Technology.