Performance Improvement of Adaptive Wavelet Thresholding for Speech Enhancement Using Generalized Gaussian Priors and Frame-Wise Context Modeling

This work aims at developing an adaptive wavelet thresholding algorithm for speech enhancement with significant performance improvement over other wavelet-based counterparts. This is accomplished through the formulation of the optimum threshold for noise reduction, based on the generalized Gaussian priors to fully characterize the statistics of speech and noise wavelet coefficients. In addition, through the frame-wise context modeling which enables tracking of the statistical characteristics of each individual coefficient on the frame-wise basis, the optimum threshold is accurate and adaptive at both the coefficient level and frame level. The frame-wise context model is formulated by virtue of the context subspace projection of the wavelet coefficients, with the context index employed as the invariant correspondence between successive frame parameters, thereby enabling the frame-wise tracking at the coefficient level. Simulation results show significant improvement over the wavelet-based speech enhancement algorithms in terms of the segmental signal-to-noise ratio improvement by as much as 226%, the perceptual evaluation of speech quality by 36%, the short-time objective intelligibility by 17.8% and the cepstral distance by 33.3%. When benchmarked with the well-established short-time-Fourier-transform-based counterparts, the proposed wavelet thresholding algorithm offers favorable and more robust performances, particularly under non-stationary noise conditions, with no adverse musical noise effect.

[1]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[2]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[3]  Wei-Ping Zhu,et al.  Rayleigh modeling of teager energy operated perceptual wavelet packet coefficients for enhancing noisy speech , 2017, Speech Commun..

[4]  Martin Vetterli,et al.  Wavelet thresholding for multiple noisy image copies , 2000, IEEE Trans. Image Process..

[5]  Myoung Nam Kim,et al.  Speech Enhancement Algorithm Using Recursive Wavelet Shrinkage , 2016, IEICE Trans. Inf. Syst..

[6]  I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator , 2002, IEEE Signal Processing Letters.

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[8]  Wei-Ping Zhu,et al.  Speech Enhancement Based on Student $t$ Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients and a Custom Thresholding Function , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  J. A. Domínguez-Molina A practical procedure to estimate the shape parameter in the generalized Gaussian distribution , 2002 .

[10]  Chao Li,et al.  A novel multi-band spectral subtraction method based on phase modification and magnitude compensation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Katia Moskvitch The Internet of Things 2.0: when things start to listen , 2017 .

[12]  Susanto Rahardja,et al.  /spl beta/-order MMSE spectral amplitude estimation for speech enhancement , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Martin Vetterli,et al.  Spatially adaptive wavelet thresholding with context modeling for image denoising , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[14]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[15]  Rainer Martin,et al.  Statistical Methods for the Enhancement of Noisy Speech , 2005 .

[16]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..

[17]  Heiga Zen,et al.  Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques , 2019, IEEE Signal Processing Magazine.

[18]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Yi Hu,et al.  Speech enhancement based on wavelet thresholding the multitaper spectrum , 2004, IEEE Transactions on Speech and Audio Processing.

[20]  Mohamed Djendi,et al.  Thresholding wavelet-based forward BSS algorithm for speech enhancement and complexity reduction , 2018, 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP).

[21]  Kiyohiro Shikano,et al.  Musical-Noise-Free Speech Enhancement Based on Optimized Iterative Spectral Subtraction , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Yasser Ghanbari,et al.  A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets , 2006, Speech Commun..

[23]  Jun Wang,et al.  Speech enhancement for in‐vehicle voice control systems using wavelet analysis and blind source separation , 2019, IET Intelligent Transport Systems.

[24]  Philipos C. Loizou,et al.  Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum , 2005, IEEE Transactions on Speech and Audio Processing.

[25]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[27]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[28]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[29]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[30]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Michael T. Johnson,et al.  Speech signal enhancement through adaptive wavelet thresholding , 2007, Speech Commun..

[32]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[33]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[34]  Wei-Ping Zhu,et al.  Adaptive wavelet packet thresholding with iterative Kalman filter for speech enhancement , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[35]  Yang Lu,et al.  Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Tomas Bäckström,et al.  Speech Coding, Speech Interfaces and IOT - Opportunities and Challenges , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.

[37]  Stanley Peters,et al.  Conversational In-Vehicle Dialog Systems: The past, present, and future , 2016, IEEE Signal Processing Magazine.

[38]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[39]  Mohamed Djendi,et al.  A wavelet-based forward BSS algorithm for acoustic noise reduction and speech enhancement , 2016 .

[40]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Cindy L. Bethel,et al.  Novice User Experiences with a Voice-Enabled Human-Robot Interaction Tool , 2019, 2019 29th International Conference Radioelektronika (RADIOELEKTRONIKA).

[42]  Akinori Nishihara,et al.  Two-microphone subband noise reduction scheme with a new noise subtraction parameter for speech quality enhancement , 2015, IET Signal Process..

[43]  I. Johnstone,et al.  Ideal spatial adaptation by wavelet shrinkage , 1994 .

[44]  Seiji Hayashi,et al.  A Subtractive-Type Speech Enhancement Using the Perceptual Frequency-Weighting Function , 2009, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[45]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[46]  Ahmad Akbari,et al.  Speech enhancement using a wavelet thresholding method based on symmetric Kullback-Leibler divergence , 2015, Signal Process..

[47]  Peter Vary,et al.  Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model , 2005, EURASIP J. Adv. Signal Process..

[48]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.