Speech enhancement based on noise-compensated phase spectrum

In this paper, a noisy speech enhancement method based on noise compensation performed on short time phase spectrum is presented. Here the noise estimate to be used to modify the noisy speech phase spectrum is proposed to be determined by exploiting the low frequency regions of noisy speech of current frame rather than depending only on the initial silence frames. We argue that this approach of noise estimation offers the capability of the tracking the time variation of the non-stationary noise. By employing the noise estimates thus obtained, a procedure is formulated to compensate the distortion in the phase spectrum, which is kept unchanged in the typical speech enhancement methods. The noise compensated phase spectrum is then recombined with the magnitude spectrum to produce a modified complex spectrum prior to synthesize an enhanced frame. Extensive simulations are carried out using NOIZEUS database in order evaluate the performance of proposed method. It is shown in terms of objective measures, spectrogram analysis and informal subjective listening tests that the proposed method consistently outperforms a state-of-the-art method of speech enhancement from noisy speech corrupted by car or babble noise of very low levels of SNR.

[1]  K. Yamashita,et al.  Nonstationary noise estimation using low-frequency regions for spectral subtraction , 2005, IEEE Signal Processing Letters.

[2]  Yang Lu,et al.  Estimators of the Magnitude-Squared Spectrum and Methods for Incorporating SNR Uncertainty , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Thierry Blu,et al.  A New SURE Approach to Image Denoising: Interscale Orthonormal Wavelet Thresholding , 2007, IEEE Transactions on Image Processing.

[4]  Celia Shahnaz,et al.  A semisoft thresholding method based on Teager energy operation on wavelet packet coefficients for enhancing noisy speech , 2013, EURASIP J. Audio Speech Music. Process..

[5]  John H. L. Hansen,et al.  Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Ben P. Milner,et al.  Visually Derived Wiener Filters for Speech Enhancement , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Susanto Rahardja,et al.  An invertible frequency eigendomain transformation for masking-based subspace speech enhancement , 2005, IEEE Signal Processing Letters.

[8]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[9]  J. Rouat,et al.  Wavelet speech enhancement based on the Teager energy operator , 2001, IEEE Signal Processing Letters.

[10]  Celia Shahnaz,et al.  Noisy speech enhancement based on an adaptive threshold and a modified hard thresholding function in wavelet packet domain , 2013, Digit. Signal Process..

[11]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[12]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[13]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Mandy Eberhart,et al.  Speech Communications Human And Machine , 2016 .

[15]  Kuldip K. Paliwal,et al.  Exploiting Conjugate Symmetry of the Short-Time Fourier Spectrum for Speech Enhancement , 2008, IEEE Signal Processing Letters.