A robust BFCC feature extraction for ASR system

An auditory-based feature extraction algorithm naming the Basilar-membrane Frequency-band Cepstral Coefficient (BFCC) is proposed to increase the robustness for automatic speech recognition. Compared to Fourier spectrogram based of the Mel-Frequency Cepstral Coefficient (MFCC) method, the proposed BFCC method engages an auditory spectrogram based on agammachirp wavelet transform to simulate the auditory response of human inner ear to improve the noise immunity. In addition, the Hidden Markov Model (HMM) is used for evaluating the proposed BFCC in phases of training and testing purposes conducted by AURORA-2 corpus with different Signal-to-Noise Ratios (SNRs) degrees of datasets. The experimental results indicate the proposed BFCC, compared with MFCC, Gammatone Wavelet Cepstral Coefficient (GWCC), and Gammatone Frequency Cepstral Coefficient (GFCC), improves the speech recognition rate by 13%, 17%, and 0.5% respectively, on average given speech samples with SNRs ranging from -5 to 20 dB.

[1]  Brian Hanson,et al.  Robust speaker-independent word recognition using static, dynamic and acceleration features: experiments with Lombard and noisy speech , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[2]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[3]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[4]  T. Irino,et al.  A compressive gammachirp auditory filter for both physiological and psychophysical data. , 2001, The Journal of the Acoustical Society of America.

[5]  R. Patterson Auditory filter shapes derived with noise stimuli. , 1976, The Journal of the Acoustical Society of America.

[6]  Chandra Sekhar Seelamantula,et al.  Auditory-motivated Gammatone wavelet transform , 2014, Signal Process..

[7]  Chandra Sekhar Seelamantula,et al.  Gammatone wavelet Cepstral Coefficients for robust speech recognition , 2013, 2013 IEEE International Conference of IEEE Region 10 (TENCON 2013).

[8]  L. Carney,et al.  Frequency glides in the impulse responses of auditory-nerve fibers. , 1999 .

[9]  R A Lutfi,et al.  On the growth of masking asymmetry with stimulus intensity. , 1984, The Journal of the Acoustical Society of America.

[10]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[11]  Qi Li,et al.  An Auditory-Based Feature Extraction Algorithm for Robust Speaker Identification Under Mismatched Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  B. Moore,et al.  Auditory filter shapes at low center frequencies. , 1990, The Journal of the Acoustical Society of America.

[13]  Richard J. Baker,et al.  Characterising auditory filter nonlinearity , 1994, Hearing Research.

[14]  Louis L. Scharf,et al.  A Multistage Representation of the Wiener Filter Based on Orthogonal Projections , 1998, IEEE Trans. Inf. Theory.

[15]  Jorg Kliewer,et al.  The complex-valued continuous wavelet transform as a preprocessor for auditory scene analysis , 1998 .

[16]  A. Mertins,et al.  Vocal tract length invariant features for automatic speech recognition , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[17]  Kuansan Wang,et al.  Auditory representations of acoustic signals , 1992, IEEE Trans. Inf. Theory.

[18]  T. Irino,et al.  A time-domain, level-dependent auditory filter: The gammachirp , 1997 .

[19]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[20]  Don Schofield Visualisations of speech based on a model of the peripheral auditory system , 1985 .