An Adaptive Wavelet-Based Denoising Algorithm for Enhancing Speech in Non-stationary Noise Environment

Traditional wavelet-based speech enhancement algorithms are ineffective in the presence of highly non-stationary noise because of the difficulties in the accurate estimation of the local noise spectrum. In this paper, a simple method of noise estimation employing the use of a voice activity detector is proposed. We can improve the output of a wavelet-based speech enhancement algorithm in the presence of random noise bursts according to the results of VAD decision. The noisy speech is first preprocessed using bark-scale wavelet packet decomposition (BSWPD) to convert a noisy signal into wavelet coefficients (WCs). It is found that the VAD using bark-scale spectral entropy, called as BS-Entropy, parameter is superior to other energy-based approach especially in variable noise-level. The wavelet coefficient threshold (WCT) of each subband is then temporally adjusted according to the result of VAD approach. In a speech-dominated frame, the speech is categorized into either a voiced frame or an unvoiced frame. A voiced frame possesses a strong tone-like spectrum in lower subbands, so that the WCs of lower-band must be reserved. On the contrary, the WCT tends to increase in lower-band if the speech is categorized as unvoiced. In a noise-dominated frame, the background noise can be almost completely removed by increasing the WCT. The objective and subjective experimental results are then used to evaluate the proposed system. The experiments show that this algorithm is valid on various noise conditions, especially for color noise and non-stationary noise conditions.

[1]  Masakiyo Fujimoto,et al.  Study of integration of statistical model-based voice activity detection and noise suppression , 2008, INTERSPEECH.

[2]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[3]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[4]  Jhing-Fa Wang,et al.  Speech Enhancement Using Perceptual Wavelet Packet Decomposition and Teager Energy Operator , 2004, J. VLSI Signal Process..

[5]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[6]  Ingrid Daubechies,et al.  The wavelet transform, time-frequency localization and signal analysis , 1990, IEEE Trans. Inf. Theory.

[7]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[8]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[9]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[11]  Sheau-Fang Lei,et al.  Speech enhancement for nonstationary noises by wavelet packet transform and adaptive noise estimation , 2005, 2005 International Symposium on Intelligent Signal Processing and Communication Systems.

[12]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[13]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[14]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[15]  Eliathamby Ambikairajah,et al.  Adaptive noise estimation algorithm for speech enhancement , 2003 .

[16]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[17]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[18]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[19]  I. Johnstone,et al.  Wavelet Threshold Estimators for Data with Correlated Noise , 1997 .

[20]  Stéphane Mallat,et al.  Multifrequency channel decompositions of images and wavelet models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[21]  Biing-Hwang Juang,et al.  Recent developments in speech recognition under adverse conditions , 1990, ICSLP.

[22]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[23]  Andrzej Drygajlo,et al.  Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms , 1999, IEEE Trans. Signal Process..

[24]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[25]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[26]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[27]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.