论文信息 - Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors

Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors

In this paper, we present an improved estimator for the speech presence probability at each time-frequency point in the short-time Fourier transform domain. In contrast to existing approaches, this estimator does not rely on an adaptively estimated and thus signal-dependent a priori signal-to-noise ratio estimate. It therefore decouples the estimation of the speech presence probability from the estimation of the clean speech spectral coefficients in a speech enhancement task. Using both a fixed a priori signal-to-noise ratio and a fixed prior probability of speech presence, the proposed a posteriori speech presence probability estimator achieves probabilities close to zero for speech absence and probabilities close to one for speech presence. While state-of-the-art speech presence probability estimators use adaptive prior probabilities and signal-to-noise ratio estimates, we argue that these quantities should reflect true a priori information that shall not depend on the observed signal. We present a detection theoretic framework for determining the fixed a priori signal-to-noise ratio. The proposed estimator is conceptually simple and yields a better tradeoff between speech distortion and noise leakage than state-of-the-art estimators.

[1] R. McAulay,et al. Speech enhancement using a soft-decision noise suppression filter , 1980 .

[2] Peter Vary,et al. Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[3] DeLiang Wang,et al. Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[4] H. V. Trees. Detection, Estimation, And Modulation Theory , 2001 .

[5] Israel Cohen,et al. Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[6] Harry L. Van Trees,et al. Detection, Estimation, and Modulation Theory, Part I , 1968 .

[7] Rainer Martin,et al. Cepstral Smoothing of Spectral Filter Gains for Speech Enhancement Without Musical Noise , 2007, IEEE Signal Processing Letters.

[8] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[9] Rainer Martin,et al. Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[10] Wonyong Sung,et al. A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[11] M. Melamed. Detection , 2021, SETI: Astronomy as a Contact Sport.

[12] Jesper Jensen,et al. Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Israel Cohen,et al. Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[14] David Malah,et al. Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15] Yariv Ephraim,et al. Recent Advancements in Speech Enhancement , 2004 .

[16] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[17] Olivier Cappé,et al. Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[18] Søren Vang Andersen,et al. Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions , 2005, EURASIP J. Adv. Signal Process..