Speech enhancement based on a modified spectral subtraction method

In this paper, a noisy speech enhancement method based on modified spectral subtraction performed on short time magnitude spectrum is presented. Here the cross-terms containing spectra of noise and clean signals are taken into consideration which are neglected in the traditional spectral subtraction method on the basis of the assumption that clean speech and noise signals are completely uncorrelated which is not true for most of the noises. In this method, the noise estimate to be subtracted from the noisy speech spectrum is proposed to be determined exploiting the low frequency regions of noisy speech of the current frame rather than depending only on the initial silence frames. We argue that this approach of noise estimation is capable of tracking the time variation of the non-stationary noise. By employing the noise estimates thus obtained, a procedure is formulated to reduce noise from the magnitude spectrum of noisy speech signal. The noise reduced magnitude spectrum is then recombined with the unchanged phase spectrum to produce a modified complex spectrum prior to synthesizing an enhanced frame. Extensive simulations are carried out using NOIZEUS database in order to evaluate the performance of the proposed method. It is shown in terms of objective measures, spectrogram analysis and subjective listening tests that the proposed method consistently outperforms one of the state-of-the-art methods of speech enhancement from noisy speech corrupted by babble or car noise at high as well as very low levels of SNR.

[1]  Nicholas W. D. Evans,et al.  An Assessment on the Fundamental Limitations of Spectral Subtraction , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Mandy Eberhart,et al.  Speech Communications Human And Machine , 2016 .

[3]  Celia Shahnaz,et al.  Noisy speech enhancement based on an adaptive threshold and a modified hard thresholding function in wavelet packet domain , 2013, Digit. Signal Process..

[4]  Susanto Rahardja,et al.  An invertible frequency eigendomain transformation for masking-based subspace speech enhancement , 2005, IEEE Signal Processing Letters.

[5]  Thierry Blu,et al.  A New SURE Approach to Image Denoising: Interscale Orthonormal Wavelet Thresholding , 2007, IEEE Transactions on Image Processing.

[6]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[7]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[8]  Ben P. Milner,et al.  Visually Derived Wiener Filters for Speech Enhancement , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[11]  Mervyn A. Jack,et al.  Improving performance of spectral subtraction in speech recognition using a model for additive noise , 1998, IEEE Trans. Speech Audio Process..