Isolated Word Recognition Using Enhanced MFCC and IIFs

The main objective of this paper is to design a noise-resilient and speaker independent speech recognition system for isolated word recognition. Mel-frequency Cepstral Coefficients (MFCCs) has been used for feature extraction. Noise robust performance of MFCC under mismatched training and testing conditions is enhanced by the application of wavelet based denoising algorithm and also to make MFCCs as robust to variation in vocal track length (VTL) an invariant-integration method is applied. The resultant features are called as enhanced MFCC Invariant-Integration Features (EMFCCIIFs). To accomplish the objective of this paper, classifier called feature-finding neural network (FFNN) is used for the recognition of isolated words. Results are compared with the results obtained by the traditional MFCC features. Through experiments it is observed that under mismatched conditions, the EMFCCIIFs features remains high recognition rate under low Signal-to-noise ratios (SNRs) and their performance are more effective under high SNRs too.

[1]  Xueying Zhang,et al.  The Research of Noise-Robust Speech Recognition Based on Frequency Warping Wavelet , 2007 .

[2]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[3]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[4]  Alfred Mertins,et al.  Invariant-integration method for robust feature extraction in speaker-independent speech recognition , 2009, INTERSPEECH.

[5]  Nawras Mohamed Aldibbiat Optical wireless communication systems employing dual header pulse interval modulation (DH-PIM). , 2001 .

[6]  Hans Burkhardt,et al.  On invariant sets of a certain class of fast translation-invariant transforms , 1980 .

[7]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[8]  Hans Werner Strube,et al.  Recognition of isolated words based on psychoacoustics and neurobiology , 1990, Speech Commun..

[9]  T. Gramss Word recognition with the feature finding neural network (FFNN) , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[10]  L. R. Rabiner,et al.  Speech Recognition, Automatic: History , 2006 .

[11]  Jie Zhang,et al.  A Novel Noise-Robust Speech Recognition System Based on Adaptively Enhanced Bark Wavelet MFCC , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13]  Zabih Ghassemlooy,et al.  Spectral characteristics of dual header pulse interval modulation (DH-PIM) , 2001 .

[14]  R. P. Ramachandran,et al.  Robust speaker recognition: a feature-based approach , 1996, IEEE Signal Processing Magazine.

[15]  Alfred Mertins,et al.  Contextual invariant-integration features for improved speaker-independent speech recognition , 2011, Speech Commun..

[16]  Alfred Mertins,et al.  Generalized cyclic transformations in speaker-independent speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[17]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[18]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.