A 40 nm 144 mW VLSI processor for realtime 60 kWord continuous speech recognition

We have developed a low-power VLSI chip for 60-kWord real-time continuous speech recognition based on a Hidden Markov Model (HMM). Our implementation includes a cache architecture using locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, highly parallel Gaussian Mixture Model (GMM) computation based on the mixture level, a variable 50-frame look-ahead scheme and elastic pipeline operation between the Viterbi transition and GMM processing. Results show that our implementation achieves 95% bandwidth reduction (70.86 MB/s) and 78% required frequency reduction (126.5 MHz) for 60-kWord real-time continuous speech recognition. The test chip, fabricated using 40 nm CMOS technology and containing 1.9 M transistors for logic and 7.8 Mbit on-chip memory, occupies 2.2 mm × 2.5 mm area. Measured data show 144 mW power consumption at 126.5 MHz and 1.1 V.