论文信息 - A low latency modular-level deeply integrated MFCC feature extraction architecture for speech recognition

A low latency modular-level deeply integrated MFCC feature extraction architecture for speech recognition

Abstract In this paper, a low-complex chip to extract the Mel Frequency Cepstral Coefficient for a speech recognition system is presented. The architecture can operate in a continuous-flow manner to process streaming or the stored speech signal at high speed. The frame-overlap Hamming window, DFT and Mel-filter bank computations are deeply integrated to share memory buffers and avoid bit-reversal circuit to reduce area and latency. Moreover, normalised energy consumption and area delay product are reduced by 32%, and speed is increased by 5.2 times compared to prior works. Further, the fixed-point word-length is optimised to minimise the area without affecting the accuracy.

Gopalakrishnan Lakshminarayanan | Antony Xavier Glittas | S BibinSamPaul

[1] Dat Thanh Ngo,et al. A High Performance Dynamic ASIC-Based Audio Signal Feature Extraction (MFCC) , 2016, 2016 International Conference on Advanced Computing and Applications (ACOMP).

[2] Mohammed Bahoura,et al. Hardware implementation of MFCC feature extraction for respiratory sounds analysis , 2013, 2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA).

[3] Victoria Rodellar-Biarge,et al. Hardware reusable design of feature extraction for distributed speech recognition , 2007 .

[4] Gin-Der Wu,et al. Parallel Dual-Accumulator based Mel Frequency Cepstral Coefficient for speech recognition , 2008 .

[5] James R. Glass,et al. A 6 mW, 5,000-Word Real-Time Speech Recognizer Using WFST Models , 2015, IEEE Journal of Solid-State Circuits.

[6] Oliver Chiu-sing Choy,et al. An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[7] In-Cheol Park,et al. Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[9] Pei-Yun Tsai,et al. A Generalized Conflict-Free Memory Addressing Scheme for Continuous-Flow Parallel-Processing FFT Processors With Rescheduling , 2011, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10] Peter Liu,et al. Minimizing the memory requirement for continuous flow FFT implementation: continuous flow mixed mode FFT (CFMM-FFT) , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[11] K. Kunaraj,et al. Leading one detectors and leading one position detectors - An evolutionary design methodology , 2013, Canadian Journal of Electrical and Computer Engineering.

[12] Shousheng He,et al. Designing pipeline FFT processor for OFDM (de)modulation , 1998, 1998 URSI International Symposium on Signals, Systems, and Electronics. Conference Proceedings (Cat. No.98EX167).

[13] Ta-Wen Kuan,et al. Optimized radix-2 FFT and Mel-filter bank in MFCC-based events sound recognition chip design for active smart warming care , 2014, 2014 International Conference on Orange Technologies.

[14] Hua Ye,et al. Implementation of the MFCC front-end for low-cost speech recognition systems , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[15] Yamin Li,et al. A new non-restoring square root algorithm and its VLSI implementations , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[16] Steven F. Quigley,et al. FPGA Implementation for GMM-Based Speaker Identification , 2011, Int. J. Reconfigurable Comput..

[17] Jhing-Fa Wang,et al. Chip design of MFCC extraction for speech recognition , 2002, Integr..

[18] Douglas D. O'Shaughnessy,et al. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition , 1999, IEEE Trans. Speech Audio Process..

[19] Enrique Cantó,et al. Real-Time Speaker Verification System Implemented on Reconfigurable Hardware , 2013, J. Signal Process. Syst..

[20] Jesús Grajal,et al. Optimum Circuits for Bit Reversal , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[21] Sunil Kumar Kopparapu,et al. Choice of Mel filter bank in computing MFCC of a resampled speech , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[22] Clay S. Turner,et al. A Fast Binary Logarithm Algorithm [DSP Tips & Tricks] , 2010, IEEE Signal Processing Magazine.

[23] Jhing-Fa Wang,et al. Chip design of mel frequency cepstral coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).