A noise robust feature extraction algorithm using joint wavelet packet subband decomposition and AR modeling of speech signals

This paper presents a noise robust feature extraction algorithm NRFE using joint wavelet packet decomposition (WPD) and autoregressive (AR) modeling of a speech signal. In opposition to the short time Fourier transform (STFT)-based time-frequency signal representation, wavelet packet decomposition can lead to better representation of non-stationary parts of the speech signal (e.g. consonants). The vowels are well described with an AR model as in LPC analysis. The proposed Root-Log compression scheme is used to perform the computation of the wavelet packet parameters. The separately extracted WPD and AR-based parameters are combined together and then transformed with the usage of linear discriminant analysis (LDA) to finally produce a lower dimensional output feature vector. The noise robustness is improved with the application of proposed wavelet-based denoising algorithm with a modified soft thresholding procedure and time-frequency adaptive threshold. The proposed voice activity detector based on a skewness-to-kurtosis ratio of the LPC residual signal is used to effectively perform a frame-dropping principle. The speech recognition results achieved on Aurora 2 and Aurora 3 databases show overall performance improvement of 44.7% and 48.2% relative to the baseline MFCC front-end, respectively.

[1]  Zdravko Kacic,et al.  Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling , 2003, INTERSPEECH.

[2]  Damjan Vlaj,et al.  ROBUST MFCC FEATURE EXTRACTION ALGORITHM USING EFFICIENT ADDITIVE AND CONVOLUTIONAL NOISE REDUCTION PROCEDURES , 2002 .

[3]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition: Fundamentals and Applications , 1995 .

[4]  Lutz Welling Merkmalsextraktion in Spracherkennungssystemen für großen Wortschatz , 1999 .

[5]  Ivan W. Selesnick Explicit Formulas for Orthogonal IIR Wavelets , 1997 .

[6]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[8]  Roberto Gemello,et al.  Robust multiple resolution analysis for automatic speech recognition , 2002, INTERSPEECH.

[9]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[10]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[11]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[12]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[13]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[14]  Hamid Sheikhzadeh,et al.  Eurospeech 2001-Scandinavia AN IMPROVED WAVELET-BASED SPEECH ENHANCEMENT SYSTEM , 2001 .

[15]  Renato De Mori,et al.  Multiple resolution analysis for robust automatic speech recognition , 2006, Comput. Speech Lang..

[16]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[17]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[18]  Zdravko Kacic,et al.  The Development and Integration of the LDA-Toolkit Into COST249 SpeechDat(II) SIG Reference Recognizer , 2004, LREC.

[19]  Charles K. Chui,et al.  An Introduction to Wavelets , 1992 .

[20]  Damjan Vlaj,et al.  Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems , 2003, Int. J. Speech Technol..

[21]  Zekeriya Tufekci,et al.  Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[22]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[23]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[24]  I. Daubechies Ten Lectures on Wavelets , 1992 .

[25]  John H. L. Hansen,et al.  Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition , 2001, INTERSPEECH.

[26]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[27]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .