A general optimization procedure for spectral speech enhancement methods

Commonly used spectral amplitude estimators, such as those proposed by Ephraim and Malah, are only optimal when the statistical model is correct and the speech and noise spectral variances are known. In practice, the spectral variances have to be estimated. A simple analysis of the “decision-directed” approach for speech spectral variance estimation reveils the presence of an important bias at low SNRs. To correct for modeling errors and estimation inaccuracies, we propose a general optimization procedure, with two gain functions applied in parallel. The unmodified algorithm is run in the background, but for the final reconstruction a different gain function is used, optimized for a wide range of signal-to-noise ratios. When this technique is implemented for the algorithms of Ephraim and Malah, a large improvement is obtained (in the order of 2 dB Segmental SNR improvement and 0.3 points increase in PESQ). Moreover, less smoothing is needed in the decision-directed spectral variance estimator.