论文信息 - 27.2 A 6mW 5K-Word real-time speech recognizer using WFST models

27.2 A 6mW 5K-Word real-time speech recognizer using WFST models

Hardware-accelerated speech recognition is needed to supplement today's cloud-based systems in power- and bandwidth-constrained scenarios such as wearable electronics. With efficient hardware speech decoders, client devices can seamlessly transition between cloud-based and local tasks depending on the availability of power and networking. Most previous efforts in hardware speech decoding [1-2] focused primarily on faster decoding rather than low-power devices operating at real-time speed. More recently, [3] demonstrated real-time decoding using 54mW and 82MB/s memory bandwidth, though their architectural optimizations are not easily generalized to the weighted finite-state transducer (WFST) models used by state-of-the-art software decoders. This paper presents a 6mW speech recognition ASIC that uses WFST search networks and performs end-to-end decoding from audio input to text output.

Anantha P. Chandrakasan | Michael Price | James Glass

[1] Wonyong Sung,et al. A Real-Time FPGA-Based 20 000-Word Speech Recognizer With Optimized DRAM Access , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[2] Rob A. Rutenbar,et al. A High-Rate, Low-Power, ASIC Speech Decoder Using Finite State Transducers , 2012, 2012 IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors.

[3] Wonyong Sung,et al. An FPGA implementation of speech recognition with weighted finite state transducers , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] I. Lee Hetherington,et al. PocketSUMMIT: small-footprint continuous speech recognition , 2007, INTERSPEECH.

[5] Shintaro Izumi,et al. A 40-nm 168-mW 2.4×-real-time VLSI processor for 60-kWord continuous speech recognition , 2012, Proceedings of the IEEE 2012 Custom Integrated Circuits Conference.