Energy-efficient MFCC extraction architecture in mixed-signal domain for automatic speech recognition

This paper proposes a novel processing architecture to extract Mel-Frequency Cepstrum Coefficients (MFCC) for automatic speech recognition. Inspired by the human ear, the energy-efficient analog-domain information processing is adopted to replace the energy-intensive Fourier Transform in conventional digital-domain. Moreover, the proposed architecture extracts the acoustic features in the mixed-signal domain, which significantly reduces the cost of Analog-to-Digital Converter (ADC) and the computational complexity. We carry out the circuit-level simulation based on 180nm CMOS technology, which shows an energy consumption of 2.4 nJ/frame, and a processing speed of 45.79 μs/frame. The proposed architecture achieves 97.2% energy saving and about 6.4× speedup than state of the art. Speech recognition simulation reaches the classification accuracy of 99% using the proposed MFCC features.