Speech Enhancement via Two-Stage Dual Tree Complex Wavelet Packet Transform with a Speech Presence Probability Estimator

In this paper, a two-stage dual tree complex wavelet packet transform (DTCWPT) based speech enhancement algorithm has been proposed, in which a speech presence probability (SPP) estimator and a generalized minimum mean squared error (MMSE) estimator are developed. To overcome the drawback of signal distortions caused by down sampling of wavelet packet transform (WPT), a two-stage analytic decomposition concatenating undecimated wavelet packet transform (UWPT) and decimated WPT is employed. An SPP estimator in the DTCWPT domain is derived based on a generalized Gamma distribution of speech, and Gaussian noise assumption. The validation results show that the proposed algorithm can obtain enhanced perceptual evaluation of speech quality (PESQ), and segmental signal-to-noise ratio (SegSNR) at low signal-to-noise ratio (SNR) nonstationary noise, compared with four other state-of-the-art speech enhancement algorithms, including optimally modified log-spectral amplitude (OM-LSA), soft masking using a posteriori SNR uncertainty (SMPO), a posteriori SPP based MMSE estimation (MMSE-SPP), and adaptive Bayesian wavelet thresholding (BWT).

[1]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Israel Cohen,et al.  Speech enhancement using a noncausal a priori SNR estimator , 2004, IEEE Signal Processing Letters.

[3]  Ergun Erçelebi,et al.  Speech enhancement based on undecimated wavelet packet-perceptual filterbanks and MMSE-STSA estimation in various noise environments , 2008, Digit. Signal Process..

[4]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  H. Tasmaz Speech Enhancement Based On Dual Tree Complex Wavelet Transform , 2015 .

[7]  Pengfei Sun,et al.  Wavelet Packet Transform based Speech Enhancement via Two-Dimensional SPP Estimator with Generalized Gamma Priors , 2016 .

[8]  Gibak Kim,et al.  Gain-induced speech distortions and the absence of intelligibility benefit with existing noise-reduction algorithms. , 2011, The Journal of the Acoustical Society of America.

[9]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[10]  C. Burrus,et al.  Noise reduction using an undecimated discrete wavelet transform , 1996, IEEE Signal Processing Letters.

[11]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[12]  Alan W. Black,et al.  Creating a database of speech in noise for unit selection synthesis , 2004, SSW.

[13]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[14]  Uwe Kiencke,et al.  Analytic Wavelet Packets—Combining the Dual-Tree Approach With Wavelet Packets for Signal Analysis and Filtering , 2009, IEEE Transactions on Signal Processing.

[15]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[16]  Haci Tasmaz Dual tree complex wavelet transform based speech enhancement , 2015, 2015 23nd Signal Processing and Communications Applications Conference (SIU).

[17]  Jun Qin,et al.  Applications and Comparison of Continuous Wavelet Transforms on Analysis of A-wave Impulse Noise , 2015 .

[18]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[19]  Kerstin Vogler,et al.  Table Of Integrals Series And Products , 2016 .

[20]  Tai-Chiu Hsung,et al.  Wavelet based speech presence probability estimator for speech enhancement , 2012, Digit. Signal Process..

[21]  Kuldip K. Paliwal,et al.  Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement , 2014, Speech Commun..

[22]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[23]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[24]  Timo Gerkmann MMSE-optimal enhancement of complex speech coefficients with uncertain prior knowledge of the clean speech phase , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[26]  Israel Cohen,et al.  Enhancement of speech using bark-scaled wavelet packet decomposition , 2001, INTERSPEECH.

[27]  Jesper Jensen,et al.  DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement , 2013, DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement.

[28]  Richard C. Hendriks,et al.  Noise power estimation based on the probability of speech presence , 2011, 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[29]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[30]  Ivan W. Selesnick,et al.  On the Dual-Tree Complex Wavelet Packet and $M$-Band Transforms , 2008, IEEE Transactions on Signal Processing.

[31]  Rainer Martin,et al.  Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Yang Lu,et al.  Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty , 2011, IEEE Transactions on Audio, Speech, and Language Processing.