Speech enhancement based on adaptive wavelet denoising on multitaper spectrum

Classical speech enhancement algorithms often require a good estimation of the short-time power spectrum using, for instance, the periodogram methods. However, it is well known that traditional periodogram methods are prone to induce large variance, hence produces the "musical noise" after enhancement. To alleviate this problem, multitaper spectrum (MTS) estimators with wavelet denoising were proposed. In this paper, we investigate the properties of the MTS of noisy speech signals. We find that, in the log MTS domain, the variance of noise varies according to the magnitude of the underlying speech spectrum. It implies that when applying wavelet denoising to the log MTS, the constant threshold used in the traditional methods is not appropriate. Based on this observation, we further develop a wavelet denoising method with adaptive threshold for estimating power spectrum using multitaper. Simulation results show that the spectrum estimated using the proposed method is consistently more accurate than the traditional uniform thresholding methods. Hence, it further improves the current speech enhancement algorithms using the MTS approaches.