Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions

Existing speech enhancement algorithms can improve speech quality but not speech intelligibility, and the reasons for that are unclear. In the present paper, we present a theoretical framework that can be used to analyze potential factors that can influence the intelligibility of processed speech. More specifically, this framework focuses on the fine-grain analysis of the distortions introduced by speech enhancement algorithms. It is hypothesized that if these distortions are properly controlled, then large gains in intelligibility can be achieved. To test this hypothesis, intelligibility tests are conducted with human listeners in which we present processed speech with controlled speech distortions. The aim of these tests is to assess the perceptual effect of the various distortions that can be introduced by speech enhancement algorithms on speech intelligibility. Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others. When these distortions were properly controlled, however, large gains in intelligibility were obtained by human listeners, even by spectral-subtractive algorithms which are known to degrade speech quality and intelligibility.

[1]  Phil D. Green,et al.  Recognition of speech separated from acoustic mixtures , 1994 .

[2]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[3]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[4]  CookeMartin,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001 .

[5]  P. Renevey,et al.  Detection of Reliable Features for Speech Recognition in Noisy Condi-tions Using a Statistical Criterion , 2001 .

[6]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[7]  Sven Nordholm,et al.  Spectral subtraction using reduced delay convolution and adaptive averaging , 2001, IEEE Trans. Speech Audio Process..

[8]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[9]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[10]  DeLiang Wang,et al.  A multistage approach for blind separation of convolutive speech mixtures , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Yi Hu,et al.  A new sound coding strategy for suppressing noise in cochlear implants. , 2008, The Journal of the Acoustical Society of America.

[12]  R. Bentler,et al.  Digital noise reduction: Outcomes from laboratory and field studies , 2008, International journal of audiology.

[13]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[15]  Hiroshi Sawada,et al.  Reducing musical noise by a fine-shift overlap-add method applied to source separation using a time-frequency mask , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Daniel P. W. Ellis,et al.  Speech separation using speaker-adapted eigenvoice speech models , 2010, Comput. Speech Lang..

[17]  DeLiang Wang,et al.  On the optimality of ideal binary time-frequency masks , 2009, Speech Commun..

[18]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[19]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[20]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[21]  K. D. Kryter Methods for the Calculation and Use of the Articulation Index , 1962 .

[22]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[23]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[24]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[25]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[26]  Daniel P. W. Ellis,et al.  Model-Based Scene Analysis , 2005 .

[27]  Y. Hu,et al.  TECHNIQUES FOR ESTIMATING THE IDEAL BINARY MASK , 2008 .

[28]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[29]  C V Pavlovic,et al.  Derivation of primary parameters and procedures for use in speech intelligibility predictions. , 1987, The Journal of the Acoustical Society of America.

[30]  Yi Hu,et al.  A generalized subspace approach for enhancing speech corrupted by colored noise , 2003, IEEE Trans. Speech Audio Process..

[31]  Philipos C. Loizou,et al.  Speech Quality Assessment , 2011, Multimedia Analysis, Processing and Communications.

[32]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[33]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[34]  Jae Lim,et al.  Evaluation of a correlation subtraction method for enhancing speech degraded by additive white noise , 1978 .

[35]  C V Pavlovic,et al.  An articulation index based procedure for predicting the speech recognition performance of hearing-impaired individuals. , 1986, The Journal of the Acoustical Society of America.

[36]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[37]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006, IEEE Trans. Neural Networks.

[38]  K. D. Kryter Validation of the Articulation Index , 1962 .

[39]  DeLiang Wang,et al.  Speech intelligibility in background noise with ideal binary time-frequency masking. , 2009, The Journal of the Acoustical Society of America.

[40]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[41]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.