论文信息 - Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition

Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition

This paper proposes a novel processing architecture to extract Mel-Frequency Cepstrum Coefficients (MFCC) for automatic speech recognition. Inspired by the human ear, the energy-efficient analog-domain information processing is adopted to replace the energy-intensive Fourier Transform in conventional digital-domain. Moreover, the proposed architecture extracts the acoustic features in the mixed-signal domain, which significantly reduces the cost of Analog-to-Digital Converter (ADC) and the computational complexity. We carry out the circuit-level simulation based on 180nm CMOS technology, which shows an energy consumption of 2.4 nJ/frame, and a processing speed of 45.79 μs/frame. The proposed architecture achieves 97.2% energy saving and about 6.4× speedup than state of the art. Speech recognition simulation reaches the classification accuracy of 99% using the proposed MFCC features.

[1] In-Cheol Park,et al. Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2] Anantha P. Chandrakasan,et al. 27.2 A 6mW 5K-Word real-time speech recognizer using WFST models , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).

[3] B. Venkataramani,et al. Hardware Implementation of Real-Time Speech Recognition System Using TMS320C6713 DSP , 2011, 2011 24th Internatioal Conference on VLSI Design.

[4] Jürgen Schmidhuber,et al. Biologically Plausible Speech Recognition with LSTM Neural Nets , 2004, BioADIT.

[5] Weijia Shang,et al. Efficient MFCC feature extraction on Graphics Processing Units , 2013 .

[6] Lewis Winner,et al. 1967 International Solid-State Circuits Conference : digest of technical papers : ISSCC , 1967 .