Speech enhancement using a wavelet thresholding method based on symmetric Kullback-Leibler divergence

Performance of wavelet thresholding methods for speech enhancement strongly depends on estimating an exact threshold value in the wavelet sub-bands. In this paper, we propose a new method for more exact estimation of the threshold value. Our proposed threshold value is firstly obtained based on the symmetric Kullback-Leibler divergence between the probability distributions of noisy speech and noise wavelet coefficients. In the next step, we improved this value using the segmental Signal-to-Noise Ratio (SNR). We used some TIMIT utterances to assess the performance of the proposed threshold. The algorithm is evaluated using the Perceptual Evaluation of Speech Quality (PESQ) score and the SNR improvement in ideal and real modes. In ideal and real modes, on average, we obtain respectively 2.25dB and 1dB SNR improvement and a PESQ score increase up to 1.1, 0.75 compared with the conventional wavelet thresholding approaches. In comparison to the adaptive thresholding approach, on average in ideal and real modes, we obtain respectively 1.6dB and 0.9dB SNR improvement. The PESQ value of the adaptive thresholding method, in the real and ideal modes, is 0.25 higher and 0.5 lower than that of our proposed method, respectively. A threshold value estimation method based on the K-L divergence is proposed.The threshold value is improved using segmental signal to noise ratio.The proposed threshold performed well in both ideal and real modes.

[1]  Truong Q. Nguyen,et al.  Wavelets and filter banks , 1996 .

[2]  Ronald R. Coifman,et al.  In Wavelets and Statistics , 1995 .

[3]  M. Gabrea,et al.  Wavelet based speech enhancement using two different threshold-based denoising algorithms , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[4]  Michael T. Johnson,et al.  Distributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation , 2012, Signal Process..

[5]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[6]  John S. Baras,et al.  Optimal wavelet basis selection for signal representation , 1994, Defense, Security, and Sensing.

[7]  Tran Huy Dat,et al.  Gamma Modeling of Speech Power and Its On-Line Estimation for Statistical Speech Enhancement , 2006, IEICE Trans. Inf. Syst..

[8]  Matthias Nussbaum,et al.  Advanced Digital Signal Processing And Noise Reduction , 2016 .

[9]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[10]  Rainer Martin,et al.  MAP Estimators for Speech Enhancement Under Normal and Rayleigh Inverse Gaussian Distributions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Sridha Sridharan,et al.  Speech enhancement for forensic applications using dynamic time warping and wavelet packet analysis , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[12]  Yasser Ghanbari,et al.  A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets , 2006, Speech Commun..

[13]  I. Johnstone,et al.  Wavelet Threshold Estimators for Data with Correlated Noise , 1997 .

[14]  I. Johnstone,et al.  Adapting to Unknown Smoothness via Wavelet Shrinkage , 1995 .

[15]  Leah H. Jamieson,et al.  High-quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling , 1998, IEEE Trans. Signal Process..

[16]  Ian Burnett,et al.  Exploiting simultaneously masked linear prediction in a WI speech coder , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[17]  L D Cromwell,et al.  Filtering noise from images with wavelet transforms , 1991, Magnetic resonance in medicine.

[18]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[19]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[20]  Marc Moonen,et al.  A combined multi-channel Wiener filter-based noise reduction and dynamic range compression in hearing aids , 2012, Signal Process..

[21]  L. Goddard Information Theory , 1962, Nature.

[22]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[23]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[24]  M. Inés Torres,et al.  Pattern Recognition and Image Analysis , 2017, Lecture Notes in Computer Science.

[25]  Eliathamby Ambikairajah,et al.  Speech enhancement for nonstationary noise environment , 2002, Asia-Pacific Conference on Circuits and Systems.

[26]  Joe F. Chicharo,et al.  A new waveform interpolation coding scheme based on pitch synchronous wavelet transform decomposition , 2000, IEEE Trans. Speech Audio Process..

[27]  Sung-Il Yang,et al.  Speech enhancement using adaptive wavelet shrinkage , 2001, ISIE 2001. 2001 IEEE International Symposium on Industrial Electronics Proceedings (Cat. No.01TH8570).

[28]  S. Coles,et al.  An Introduction to Statistical Modeling of Extreme Values , 2001 .

[29]  Jeih-Weih Hung,et al.  Improved modulation spectrum enhancement methods for robust speech recognition , 2012, Signal Process..

[30]  Maarten Jansen,et al.  Noise Reduction by Wavelet Thresholding , 2001 .

[31]  Steven F. Boll A spectral subtraction algorithm for suppression of acoustic noise in speech , 1979, ICASSP.

[32]  Jean Rouat,et al.  Wavelet speech enhancement based on time-scale adaptation , 2006, Speech Commun..

[33]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[34]  Tran Huy Dat,et al.  Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation , 2008, IEICE Trans. Inf. Syst..

[35]  N. Ruiz Reyes,et al.  A NEW COST FUNCTION TO SELECT THE WAVELET DECOMPOSITION FOR AUDIO COMPRESSION , .

[36]  Jesper Jensen,et al.  Speech enhancement based on Rayleigh mixture modeling of speech spectral amplitude distributions , 2007, 2007 15th European Signal Processing Conference.

[37]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[38]  Rainer Martin,et al.  Combined acoustic echo control and noise reduction for hands-free telephony , 1998, Signal Process..

[39]  Xiao-Ping Zhang,et al.  Adaptive denoising based on SURE risk , 1998, IEEE Signal Processing Letters.

[40]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[41]  Andrzej Drygajlo,et al.  Perceptual speech coding and enhancement using frame-synchronized fast wavelet packet transform algorithms , 1999, IEEE Trans. Signal Process..

[42]  S. Mallat A wavelet tour of signal processing , 1998 .

[43]  S. Ayat,et al.  Wavelet based speech enhancement using a new thresholding algorithm , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[44]  Keun-Sung Bae,et al.  Speech enhancement with reduction of noise components in the wavelet domain , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45]  Javier Ramírez,et al.  A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[46]  Andrew G. Bruce,et al.  WaveShrink: shrinkage functions and thresholds , 1995, Optics + Photonics.

[47]  Arata Kawamura,et al.  Speech Enhancement Based on MAP Estimation with a Variable Speech Distribution , 2006, 2006 International Symposium on Intelligent Signal Processing and Communications.

[48]  Hiroshi Saruwatari,et al.  Design of multichannel frequency domain statistical-based enhancement systems preserving spatial cues via spectral distances minimization , 2013, Signal Process..

[49]  Joon-Hyuk Chang Complex laplacian probability density function for noisy speech enhancement , 2007, IEICE Electron. Express.

[50]  S. Tabibian,et al.  Noise reduction from speech signal based on wavelet transform and Kullback-Leibler divergence , 2008, 2008 International Symposium on Telecommunications.

[51]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..