Comparison of Parameter Estimation Methods for Single-Microphone Multi -Frame Wiener Filtering

The multi-frame Wiener filter (MFWF) for single-microphone speech enhancement is able to exploit speech correlation across consecutive time-frames in the short-time Fourier transform (STFT) domain. To achieve a high speech correlation, typically an STFT with a high time-resolution but a low frequency-resolution is applied. The MFWF can be decomposed into a multi-frame minimum power distortionless response (MFMPDR) filter and a single-frame Wiener postfilter. To implement the MFWF using this decomposition, estimates of several parameters are required, namely the speech correlation vector, the noisy speech correlation matrix, and the power spectral densities at the output of the MFMPDR filter. Correlations can be estimated either directly in the low frequency-resolution STFT filterbank, indirectly by estimating periodograms in a high frequency-resolution filterbank and applying the Wiener-Khinchin theorem, or in a combined way. In this paper, we compare the performance of different estimators for the required parameters. Experimental results for different speech material, noise conditions, and signal-to-noise ratios show that using a combined estimator for the speech correlation vector yields the best results in terms of speech quality compared to existing direct and indirect estimators.

[1]  Jacob Benesty,et al.  A Multi-Frame Approach to the Frequency-Domain Single-Channel Noise Reduction Problem , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Timo Gerkmann,et al.  Single-microphone speech enhancement using MVDR filtering and Wiener post-filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[4]  Emanuel A. P. Habets,et al.  Speech Enhancement in the STFT Domain , 2011, Springer Briefs in Electrical and Computer Engineering.

[5]  T. Esch MODIFIED KALMAN FILTER EXPLOITING INTERFRAME CORRELATION OF SPEECH AND NOISE MAGNITUDES , 2008 .

[6]  Rainer Martin,et al.  Estimation of Subband Speech Correlations for Noise Reduction via MVDR Processing , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Richard C. Hendriks,et al.  Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Marc Moonen,et al.  Robust Speech-Distortion Weighted Interframe Wiener Filters for Single-Channel Noise Reduction , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[10]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[11]  Simon Doclo,et al.  Robust Constrained Mfmvdr Filtering for Single-Microphone Speech Enhancement , 2018, 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC).

[12]  Simon Doclo,et al.  Sensitivity analysis of the multi-frame MVDR filter for single-microphone speech enhancement , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).