A low latency modular-level deeply integrated MFCC feature extraction architecture for speech recognition

Abstract In this paper, a low-complex chip to extract the Mel Frequency Cepstral Coefficient for a speech recognition system is presented. The architecture can operate in a continuous-flow manner to process streaming or the stored speech signal at high speed. The frame-overlap Hamming window, DFT and Mel-filter bank computations are deeply integrated to share memory buffers and avoid bit-reversal circuit to reduce area and latency. Moreover, normalised energy consumption and area delay product are reduced by 32%, and speed is increased by 5.2 times compared to prior works. Further, the fixed-point word-length is optimised to minimise the area without affecting the accuracy.

[1]  Dat Thanh Ngo,et al.  A High Performance Dynamic ASIC-Based Audio Signal Feature Extraction (MFCC) , 2016, 2016 International Conference on Advanced Computing and Applications (ACOMP).

[2]  Mohammed Bahoura,et al.  Hardware implementation of MFCC feature extraction for respiratory sounds analysis , 2013, 2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA).

[3]  Victoria Rodellar-Biarge,et al.  Hardware reusable design of feature extraction for distributed speech recognition , 2007 .

[4]  Gin-Der Wu,et al.  Parallel Dual-Accumulator based Mel Frequency Cepstral Coefficient for speech recognition , 2008 .

[5]  James R. Glass,et al.  A 6 mW, 5,000-Word Real-Time Speech Recognizer Using WFST Models , 2015, IEEE Journal of Solid-State Circuits.

[6]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[7]  In-Cheol Park,et al.  Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9]  Pei-Yun Tsai,et al.  A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10]  Peter Liu,et al.  Minimizing the memory requirement for continuous flow FFT implementation: continuous flow mixed mode FFT (CFMM-FFT) , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[11]  K. Kunaraj,et al.  Leading one detectors and leading one position detectors - An evolutionary design methodology , 2013, Canadian Journal of Electrical and Computer Engineering.

[12]  Shousheng He,et al.  Designing pipeline FFT processor for OFDM (de)modulation , 1998, 1998 URSI International Symposium on Signals, Systems, and Electronics. Conference Proceedings (Cat. No.98EX167).

[13]  Ta-Wen Kuan,et al.  Optimized radix-2 FFT and Mel-filter bank in MFCC-based events sound recognition chip design for active smart warming care , 2014, 2014 International Conference on Orange Technologies.

[14]  Hua Ye,et al.  Implementation of the MFCC front-end for low-cost speech recognition systems , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[15]  Yamin Li,et al.  A new non-restoring square root algorithm and its VLSI implementations , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[16]  Steven F. Quigley,et al.  FPGA Implementation for GMM-Based Speaker Identification , 2011, Int. J. Reconfigurable Comput..

[17]  Jhing-Fa Wang,et al.  Chip design of MFCC extraction for speech recognition , 2002, Integr..

[18]  Douglas D. O'Shaughnessy,et al.  Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[19]  Enrique Cantó,et al.  Real-Time Speaker Verification System Implemented on Reconfigurable Hardware , 2013, J. Signal Process. Syst..

[20]  Jesús Grajal,et al.  Optimum Circuits for Bit Reversal , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[21]  Sunil Kumar Kopparapu,et al.  Choice of Mel filter bank in computing MFCC of a resampled speech , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[22]  Clay S. Turner,et al.  A Fast Binary Logarithm Algorithm [DSP Tips & Tricks] , 2010, IEEE Signal Processing Magazine.

[23]  Jhing-Fa Wang,et al.  Chip design of mel frequency cepstral coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).