Speaker Invariant and Noise Robust Speech Recognition Using Enhanced Auditory and VTL Based Features

This paper focuses on design and implementation of a noise-resilient and speaker independent speech recognition system for isolated word recognition. In this work auditory transform (AT) based features called as Cochlear Filter Cepstral Coefficients (CFCCs) has been used for feature extraction and its robustness against noise and variation in vocal track length (VTL) performance has been enhanced by the application of wavelet based denoising algorithm and invariant-integration method respectively. The resultant features are called as enhanced CFCC Invariant-Integration Features (ECFCCIIFs). To accomplish the objective of this paper, feature-finding neural network (FFNN) is used as classifier for the recognition of isolated words. Results are compared with the results obtained by the standard CFCC features and it is observed that, at both matching and mismatching conditions the ECFCCIIFs features remains high recognition rate under low Signal-to-noise ratios (SNRs) and their performance are more effective under high SNRs too.

[1]  Qi Li,et al.  An auditory-based transfrom for audio signal processing , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[2]  Jie Zhang,et al.  A Novel Noise-Robust Speech Recognition System Based on Adaptively Enhanced Bark Wavelet MFCC , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Alfred Mertins,et al.  Invariant-integration method for robust feature extraction in speaker-independent speech recognition , 2009, INTERSPEECH.

[4]  Matti Karjalainen,et al.  1997 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA'97), New Paltz, New York, USA, Oct. 19-22, 1997 , 1997 .

[5]  Hans Werner Strube,et al.  Recognition of isolated words based on psychoacoustics and neurobiology , 1990, Speech Commun..

[6]  Alfred Mertins,et al.  On using the auditory image model and invariant-integration for noise robust automatic speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.