An improved speech transmission index for intelligibility prediction

The speech transmission index (STI) is a well known measure of intelligibility, most suited to the evaluation of speech intelligibility in rooms, with stimuli subjected to additive noise and reverberance. However, STI and its many variations do not effectively represent the intelligibility of stimuli containing non-linear distortions such as those resulting from processing by enhancement algorithms. In this paper, we revisit the STI approach and propose a variation which processes the modulation envelope in short-time segments, requiring only an assumption of quasi-stationarity (rather than the stationarity assumption of STI) of the modulation signal. Results presented in this work show that the proposed approach improves the measures correlation to subjective intelligibility scores compared to traditional STI for a range of noise types and subjected to different enhancement approaches. The approach is also shown to have higher correlation than other coherence, correlation and distance measures tested, but is unsuited to the evaluation of stimuli heavily distorted with (for example) masking based processing, where an alternative approach such as STOI is recommended.

[1]  Jesper Jensen,et al.  Intelligibility Prediction of Single-Channel Noise-Reduced Speech , 2010, Sprachkommunikation.

[2]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[3]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[4]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[5]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  B Kollmeier,et al.  Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. , 1996, The Journal of the Acoustical Society of America.

[7]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[8]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[9]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[10]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[11]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[12]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[13]  Kuldip K. Paliwal,et al.  Role of modulation magnitude and phase spectrum towards speech intelligibility , 2011, Speech Commun..

[14]  Torsten Dau,et al.  Prediction of speech intelligibility based on an auditory preprocessing model , 2010, Speech Commun..

[15]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[16]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[17]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Daniel P. W. Ellis,et al.  A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation , 2009, 2009 17th European Signal Processing Conference.

[19]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[20]  M. W. Green,et al.  2. Handbook of the Logistic Distribution , 1991 .

[21]  G. Carter,et al.  Estimation of the magnitude-squared coherence function via overlapped fast Fourier transform processing , 1973 .

[22]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[23]  L D Braida,et al.  A method to determine the speech transmission index from speech waveforms. , 1999, The Journal of the Acoustical Society of America.

[24]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[25]  Philipos C. Loizou,et al.  SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech , 2011, Speech Commun..

[26]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[27]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[28]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .