An FPGA-Based Embedded Robust Speech Recognition System Designed by Combining Empirical Mode Decomposition and a Genetic Algorithm

A field-programmable gate array (FPGA)-based robust speech measurement and recognition system is the focus of this paper, and the environmental noise problem is its main concern. To accelerate the recognition speed of the FPGA-based speech recognition system, the discrete hidden Markov model is used here to lessen the computation burden inherent in speech recognition. Furthermore, the empirical mode decomposition is used to decompose the measured speech signal contaminated by noise into several intrinsic mode functions (IMFs). The IMFs are then weighted and summed to reconstruct the original clean speech signal. Unlike previous research, in which IMFs were selected by trial and error for specific applications, the weights for each IMF are designed by the genetic algorithm to obtain an optimal solution. The experimental results in this paper reveal that this method achieves a better speech recognition rate for speech subject to various environmental noises. Moreover, this paper also explores the hardware realization of the designed speech measurement and recognition systems on an FPGA-based embedded system with the System-On-a-Chip (SOC) architecture. Since the central-processing-unit core adopted in the SOC has limited computation ability, this paper uses the integer fast Fourier transform (FFT) to replace the floating-point FFT to speed up the computation for capturing speech features through a mel-frequency cepstrum coefficient. The result is a significant reduction in the calculation time without influencing the speech recognition rate. It can be seen from the experiments in this paper that the performance of the implemented hardware is significantly better than that of existing research.

[1]  Diego H. Milone,et al.  Perceptual evaluation of blind source separation for robust speech recognition , 2008, Signal Process..

[2]  Rubo Zhang,et al.  Speech Detection Based on Hilbert-Huang Transform , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[3]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[4]  Shing-Tai Pan,et al.  Evolutionary Computation on Programmable Robust IIR Filter Pole-Placement Design , 2011, IEEE Transactions on Instrumentation and Measurement.

[5]  Jia Liu,et al.  Single-chip speech recognition system based on 8051 microcontroller core , 2001, IEEE Trans. Consumer Electron..

[6]  Waleed H. Abdulla,et al.  Hardware–Software Codesign of Automatic Speech Recognition System for Embedded Real-Time Applications , 2011, IEEE Transactions on Industrial Electronics.

[7]  Luis Romeral,et al.  Short-Circuit Detection by Means of Empirical Mode Decomposition and Wigner–Ville Distribution for PMSM Running Under Dynamic Condition , 2009, IEEE Transactions on Industrial Electronics.

[8]  Sadaoki Furui,et al.  Digital Speech Processing, Synthesis, and Recognition , 1989 .

[9]  Liu Jia,et al.  Single-chip speech recognition system based on 8051 microcontroller core , 2001 .

[10]  Keikichi Hirose,et al.  Single-Mixture Audio Source Separation by Subspace Decomposition of Hilbert Spectrum , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Ching-Chih Tsai,et al.  FPGA-Based Parallel DNA Algorithm for Optimal Configurations of an Omnidirectional Mobile Service Robot Performing Fire Extinguishment , 2011, IEEE Transactions on Industrial Electronics.

[12]  Shing-Tai Pan,et al.  Design of Robust D-Stable IIR Filters Using Genetic Algorithms With Embedded Stability Criterion , 2009, IEEE Transactions on Signal Processing.

[13]  Reinhold Häb-Umbach,et al.  Parameter Estimation of a State-Space Model of Noise for Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Satoshi Nakamura,et al.  A Robust Speech Recognition System for Communication Robots in Noisy Environments , 2008, IEEE Transactions on Robotics.

[15]  Bum-Jae You,et al.  Fault Detection in a Microphone Array by Intercorrelation of Features in Voice Activity Detection , 2011, IEEE Transactions on Industrial Electronics.

[16]  Truong Q. Nguyen,et al.  Integer fast Fourier transform , 2002, IEEE Trans. Signal Process..

[17]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[18]  Liu Liwei,et al.  Research of speech enhancement method based on Hilbert-Huang Transform and wavelet transform , 2011, Proceedings 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE).

[19]  Rubo Zhang,et al.  Method of speech enhancement based on Hilbert-Huang transform , 2008, 2008 7th World Congress on Intelligent Control and Automation.

[20]  Jeih-Weih Hung,et al.  Incorporating Codebook and Utterance Information in Cepstral Statistics Normalization Techniques for Robust Speech Recognition in Additive Noise Environments , 2009, IEEE Signal Processing Letters.

[21]  Oscar Saz-Torralba,et al.  Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[23]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[24]  Yu Tsao,et al.  An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Ho-Sub Yoon,et al.  A Deconvolutive Neural Network for Speech Classification With Applications to Home Service Robot , 2010, IEEE Transactions on Instrumentation and Measurement.

[26]  Ho-Sub Yoon,et al.  Automated Speaker Recognition for Home Service Robots Using Genetic Algorithm and Dempster–Shafer Fusion Technique , 2009, IEEE Transactions on Instrumentation and Measurement.

[27]  Jhing-Fa Wang,et al.  Robust Environmental Sound Recognition for Home Automation , 2008, IEEE Transactions on Automation Science and Engineering.

[28]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..

[29]  Fabio Violaro,et al.  An isolated-word speech recognition system using neural networks , 1995, 38th Midwest Symposium on Circuits and Systems. Proceedings.