A 40-nm 168-mW 2.4×-real-time VLSI processor for 60-kWord continuous speech recognition

This paper describes a low-power VLSI chip for speaker-independent 60-kWord continuous speech recognition based on a context-dependent Hidden Markov Model (HMM). Our implementation includes a compression-decoding scheme to reduce the external memory bandwidth for Gaussian Mixture Model (GMM) computation and multi-path Viterbi transition units. We optimize the internal SRAM size using the max-approximation GMM calculation and adjusting the number of look-ahead frames. The test chip, fabricated in 40 nm CMOS technology, occupies 1.77 mm × 2.18 mm containing 2.52 M transistors for logic and 4.29 Mbit on-chip memory. The measured results show that our implementation achieves 34.2% required frequency reduction (83.3 MHz) and reduces 48.5% power consumption (74.14 mW) for 60 k-Word real-time continuous speech recognition compared to the previous work. This chip can maximally process 2.4× faster than real-time at 200 MHz and 1.1 V with power consumption of 168 mW.

[1]  Naoya Wada,et al.  Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2]  Rob A. Rutenbar,et al.  A 1000-word vocabulary, speaker-independent, continuous live-mode speech recognizer implemented in a single FPGA , 2007, FPGA '07.

[3]  Wonyong Sung,et al.  A Real-Time FPGA-Based 20 000-Word Speech Recognizer With Optimized DRAM Access , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[4]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[5]  Rob A. Rutenbar,et al.  A multi-fpga 10x-real-time high-speed search engine for a 5000-word vocabulary speech recognizer , 2009, FPGA '09.

[6]  Wonyong Sung,et al.  Memory Access Optimized VLSI for 5000-Word Continuous Speech Recognition , 2011, J. Signal Process. Syst..

[7]  Shintaro Izumi,et al.  A 40 nm 144 mW VLSI processor for realtime 60 kWord continuous speech recognition , 2011, 2011 IEEE Custom Integrated Circuits Conference (CICC).

[8]  Rob A. Rutenbar,et al.  In silico vox: Towards speech recognition in silicon , 2006, 2006 IEEE Hot Chips 18 Symposium (HCS).

[9]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[10]  Rob A. Rutenbar,et al.  Profiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis , 2009, INTERSPEECH.