论文信息 - A noise robust feature extraction algorithm using joint wavelet packet subband decomposition and AR modeling of speech signals

A noise robust feature extraction algorithm using joint wavelet packet subband decomposition and AR modeling of speech signals

This paper presents a noise robust feature extraction algorithm NRFE using joint wavelet packet decomposition (WPD) and autoregressive (AR) modeling of a speech signal. In opposition to the short time Fourier transform (STFT)-based time-frequency signal representation, wavelet packet decomposition can lead to better representation of non-stationary parts of the speech signal (e.g. consonants). The vowels are well described with an AR model as in LPC analysis. The proposed Root-Log compression scheme is used to perform the computation of the wavelet packet parameters. The separately extracted WPD and AR-based parameters are combined together and then transformed with the usage of linear discriminant analysis (LDA) to finally produce a lower dimensional output feature vector. The noise robustness is improved with the application of proposed wavelet-based denoising algorithm with a modified soft thresholding procedure and time-frequency adaptive threshold. The proposed voice activity detector based on a skewness-to-kurtosis ratio of the LPC residual signal is used to effectively perform a frame-dropping principle. The speech recognition results achieved on Aurora 2 and Aurora 3 databases show overall performance improvement of 44.7% and 48.2% relative to the baseline MFCC front-end, respectively.

Zdravko Kacic | Bojan Kotnik | Z. Kacic | B. Kotnik

[1] Zdravko Kacic,et al. Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling , 2003, INTERSPEECH.

[2] Damjan Vlaj,et al. ROBUST MFCC FEATURE EXTRACTION ALGORITHM USING EFFICIENT ADDITIVE AND CONVOLUTIONAL NOISE REDUCTION PROCEDURES , 2002 .

[3] Jean-Claude Junqua,et al. Robustness in Automatic Speech Recognition: Fundamentals and Applications , 1995 .

[4] Lutz Welling. Merkmalsextraktion in Spracherkennungssystemen für großen Wortschatz , 1999 .

[5] Ivan W. Selesnick. Explicit Formulas for Orthogonal IIR Wavelets , 1997 .

[6] David Pearce,et al. The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[7] J. Rouat,et al. Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[8] Roberto Gemello,et al. Robust multiple resolution analysis for automatic speech recognition , 2002, INTERSPEECH.

[9] J. Markel,et al. The SIFT algorithm for fundamental frequency estimation , 1972 .

[10] Darren Pearce,et al. Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[11] Jelena Kovacevic,et al. Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[12] David L. Donoho,et al. De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[13] Truong Q. Nguyen,et al. Wavelets and filter banks , 1996 .

[14] Hamid Sheikhzadeh,et al. Eurospeech 2001-Scandinavia AN IMPROVED WAVELET-BASED SPEECH ENHANCEMENT SYSTEM , 2001 .

[15] Renato De Mori,et al. Multiple resolution analysis for robust automatic speech recognition , 2006, Comput. Speech Lang..

[16] Jean-Claude Junqua,et al. Robustness in Automatic Speech Recognition , 1996 .

[17] Rafik A. Goubran,et al. Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[18] Zdravko Kacic,et al. The Development and Integration of the LDA-Toolkit Into COST249 SpeechDat(II) SIG Reference Recognizer , 2004, LREC.

[19] Charles K. Chui,et al. An Introduction to Wavelets , 1992 .

[20] Damjan Vlaj,et al. Efficient Noise Robust Feature Extraction Algorithms for Distributed Speech Recognition (DSR) Systems , 2003, Int. J. Speech Technol..

[21] Zekeriya Tufekci,et al. Mel-scaled discrete wavelet coefficients for speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[22] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[23] John H. L. Hansen,et al. Discrete-Time Processing of Speech Signals , 1993 .

[24] I. Daubechies. Ten Lectures on Wavelets , 1992 .

[25] John H. L. Hansen,et al. Analysis of the root-cepstrum for acoustic modeling and fast decoding in speech recognition , 2001, INTERSPEECH.

[26] Steve Young,et al. The HTK book version 3.4 , 2006 .

[27] Rainer Martin,et al. Spectral Subtraction Based on Minimum Statistics , 2001 .