Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging

Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. We present an improved minima controlled recursive averaging (IMCRA) approach, for noise estimation in adverse environments involving nonstationary noise, weak speech components, and low input signal-to-noise ratio (SNR). The noise estimate is obtained by averaging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability. The speech presence probability is controlled by the minima values of a smoothed periodogram. The proposed procedure comprises two iterations of smoothing and minimum tracking. The first iteration provides a rough voice activity detection in each frequency band. Then, smoothing in the second iteration excludes relatively strong speech components, which makes the minimum tracking during speech activity robust. We show that in nonstationary noise environments and under low SNR conditions, the IMCRA approach is very effective. In particular, compared to a competitive method, it obtains a lower estimation error, and when integrated into a speech enhancement system achieves improved speech quality and lower residual noise.

[1]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[2]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[3]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[4]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[5]  Maria Huhtala,et al.  Random Variables and Stochastic Processes , 2021, Matrix and Tensor Decompositions in Signal Processing.

[6]  Gary H. Whipple,et al.  Model based speech pause detection , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[8]  H. Saunders,et al.  Probability, Random Variables and Stochastic Processes (2nd Edition) , 1989 .

[9]  K F E2K,et al.  Spectral Enhancement By Tracking Speech Presence Probability In Subbands , 2001 .

[10]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[12]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[13]  Israel Cohen,et al.  On speech enhancement under signal presence uncertainty , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  Eliathamby Ambikairajah,et al.  Speech enhancement for nonstationary noise environment , 2002, Asia-Pacific Conference on Circuits and Systems.

[15]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[16]  R. McAulay,et al.  Speech enhancement using a soft-decision noise suppression filter , 1980 .

[17]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[19]  David Malah,et al.  Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[20]  Christophe Ris,et al.  Assessing local noise level estimation methods: Application to noise robust ASR , 2000, Speech Commun..

[21]  Gerhard Doblinger,et al.  Computationally efficient speech enhancement by spectral minima tracking in subbands , 1995, EUROSPEECH.

[22]  Klaus Uwe Simmer,et al.  Kammeyer \Comparison of one-and two-channel noise-estimation techniques , 1997 .

[23]  Rainer Martin,et al.  Spectral Subtraction Based on Minimum Statistics , 2001 .