Improved A Posteriori Speech Presence Probability Estimation Based on a Likelihood Ratio With Fixed Priors

In this paper, we present an improved estimator for the speech presence probability at each time-frequency point in the short-time Fourier transform domain. In contrast to existing approaches, this estimator does not rely on an adaptively estimated and thus signal-dependent a priori signal-to-noise ratio estimate. It therefore decouples the estimation of the speech presence probability from the estimation of the clean speech spectral coefficients in a speech enhancement task. Using both a fixed a priori signal-to-noise ratio and a fixed prior probability of speech presence, the proposed a posteriori speech presence probability estimator achieves probabilities close to zero for speech absence and probabilities close to one for speech presence. While state-of-the-art speech presence probability estimators use adaptive prior probabilities and signal-to-noise ratio estimates, we argue that these quantities should reflect true a priori information that shall not depend on the observed signal. We present a detection theoretic framework for determining the fixed a priori signal-to-noise ratio. The proposed estimator is conceptually simple and yields a better tradeoff between speech distortion and noise leakage than state-of-the-art estimators.

[1]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[2]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[3]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[4]  H. V. Trees Detection, Estimation, And Modulation Theory , 2001 .

[5]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[6]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[7]  Rainer Martin,et al.  Cepstral Smoothing of Spectral Filter Gains for Speech Enhancement Without Musical Noise , 2007, IEEE Signal Processing Letters.

[8]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[9]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[10]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[11]  M. Melamed Detection , 2021, SETI: Astronomy as a Contact Sport.

[12]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[14]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[15]  Yariv Ephraim,et al.  Recent Advancements in Speech Enhancement , 2004 .

[16]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[17]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[18]  Søren Vang Andersen,et al.  Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions , 2005, EURASIP J. Adv. Signal Process..