Impact of SNR and gain-function over- and under-estimation on speech intelligibility

Most noise reduction algorithms rely on obtaining reliable estimates of the SNR of each frequency bin. For that reason, much work has been done in analyzing the behavior and performance of SNR estimation algorithms in the context of improving speech quality and reducing speech distortions (e.g., musical noise). Comparatively little work has been reported, however, regarding the analysis and investigation of the effect of errors in SNR estimation on speech intelligibility. It is not known, for instance, whether it is the errors in SNR overestimation, errors in SNR underestimation, or both that are harmful to speech intelligibility. Errors in SNR estimation produce concomitant errors in the computation of the gain (suppression) function, and the impact of gain estimation errors on speech intelligibility is unclear. The present study assesses the effect of SNR estimation errors on gain function estimation via sensitivity analysis. Intelligibility listening studies were conducted to validate the sensitivity analysis. Results indicated that speech intelligibility is severely compromised when SNR and gain over-estimation errors are introduced in spectral components with negative SNR. A theoretical upper bound on the gain function is derived that can be used to constrain the values of the gain function so as to ensure that SNR overestimation errors are minimized. Speech enhancement algorithms that can limit the values of the gain function to fall within this upper bound can improve speech intelligibility.

[1]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[2]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.

[3]  K. D. Kryter Validation of the Articulation Index , 1962 .

[4]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2009, Speech Commun..

[5]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[6]  Pascal Scalart,et al.  Improved Signal-to-Noise Ratio Estimation for Speech Enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Olivier Cappé,et al.  Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor , 1994, IEEE Trans. Speech Audio Process..

[8]  Rainer Martin,et al.  Analysis of the Decision-Directed SNR Estimator for Speech Enhancement With Respect to Low-SNR and Transient Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[10]  Gibak Kim,et al.  Gain-induced speech distortions and the absence of intelligibility benefit with existing noise-reduction algorithms. , 2011, The Journal of the Acoustical Society of America.

[11]  Rainer Martin,et al.  Bias compensation methods for minimum statistics noise power spectral density estimation , 2006, Signal Process..

[12]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[13]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[14]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[15]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[16]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[17]  Israel Cohen,et al.  Relaxed statistical model for speech enhancement and a priori SNR estimation , 2005, IEEE Transactions on Speech and Audio Processing.

[18]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[19]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[20]  Philipos C. Loizou,et al.  A new binary mask based on noise constraints for improved speech intelligibility , 2010, INTERSPEECH.

[21]  Rainer Martin,et al.  Statistical Methods for the Enhancement of Noisy Speech , 2005 .

[22]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[23]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[24]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[25]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[26]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[28]  David V. Anderson,et al.  Robust Bayesian Analysis applied to Wiener filtering of speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Jesper Jensen,et al.  A data-driven approach to optimizing spectral speech enhancement methods for various error criteria , 2007, Speech Commun..

[31]  Yang Lu,et al.  Speech enhancement by combining statistical estimators of speech and noise , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[33]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .