A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets

Abstract In this paper, we propose a new speech enhancement system using the wavelet thresholding algorithm. The basic wavelet thresholding algorithm has some defects including the assumption of white Gaussian noise (WGN), malfunction in unvoiced segments, bad auditory quality, etc. In the proposed system, we introduce a new algorithm which does not require any voiced/unvoiced detection system. Also, in this proposed method adaptive wavelet thresholding and modified thresholding functions are introduced to improve the speech enhancement performance as well as the automatic speech recognition (ASR) accuracy. A new voice activity detector (VAD) was designed to update noise statistics in the proposed speech enhancement system when facing to the colored and non-stationary noises. The proposed method was evaluated on several speakers and under various noise conditions including white Gaussian noise, pink noise, and multi-talker babble noise. The SNR and ASR results show that the new method highly improves the performance of speech enhancement algorithm based on the wavelet thresholding.

[1]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  H.S. Jamadagni,et al.  VAD techniques for real-time speech transmission on the Internet , 2002, 5th IEEE International Conference on High Speed Networks and Multimedia Communication (Cat. No.02EX612).

[4]  Keun-Sung Bae,et al.  Speech enhancement with reduction of noise components in the wavelet domain , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[6]  Hamid Sheikhzadeh,et al.  An improved wavelet-based speech enhancement system , 2001, INTERSPEECH.

[7]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  I. Johnstone,et al.  Wavelet Threshold Estimators for Data with Correlated Noise , 1997 .

[10]  Ahmet M. Kondoz,et al.  Mixed decision-based noise adaptation for speech enhancement , 2001 .

[11]  Mark Klein,et al.  Signal subspace speech enhancement with perceptual post-filtering , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Masahide Mizushima,et al.  Environmental noise reduction based on speech/non-speech identification for hearing aids , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[14]  Y. Ghanbari,et al.  SPECTRAL SUBTRACTION IN THE WAVELET DOMAIN FOR SPEECH ENHANCEMENT , 2004 .

[15]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[16]  Hamid Sheikhzadeh,et al.  HMM-based strategies for enhancement of speech signals embedded in nonstationary noise , 1998, IEEE Trans. Speech Audio Process..

[17]  Soo Ngee Koh,et al.  Wavelet for speech denoising , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[18]  Y. Ghanbari,et al.  IMPROVED MULTI-BAND SPECTRAL SUBTRACTION METHOD FOR SPEECH ENHANCEMENT , 2004 .

[19]  Younghun Kwon,et al.  Speech enhancement for non-stationary noise environment by adaptive wavelet packet , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[21]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[22]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[23]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[24]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .