Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data.

Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.

[1]  T. Houtgast,et al.  Predicting speech intelligibility in rooms from the modulation transfer function, I. General room acoustics , 1980 .

[2]  N I Durlach,et al.  Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech. , 1985, Journal of speech and hearing research.

[3]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[4]  P. Peterson Simulating the response of multiple microphones to a single acoustic source in a reverberant room. , 1986, The Journal of the Acoustical Society of America.

[5]  R. Drullman Temporal envelope and fine structure cues for speech intelligibility , 1994 .

[6]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[7]  L D Braida,et al.  Intelligibility of conversational and clear speech in noise and reverberation for listeners with normal and impaired hearing. , 1994, The Journal of the Acoustical Society of America.

[8]  L D Braida,et al.  A method to determine the speech transmission index from speech waveforms. , 1999, The Journal of the Acoustical Society of America.

[9]  Jean C. Krause,et al.  Properties of naturally produced clear speech at normal rates and implications for intelligibility enhancement , 2001 .

[10]  Jean C. Krause,et al.  Investigating alternative forms of clear speech: the effects of speaking rate and speaking mode on intelligibility. , 2002, The Journal of the Acoustical Society of America.

[11]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[12]  Jean C. Krause,et al.  Acoustic properties of naturally produced clear speech at normal speaking rates. , 1996, The Journal of the Acoustical Society of America.

[13]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[14]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[15]  Tammo Houtgast,et al.  A detailed study on the effects of noise on speech intelligibility. , 2007, The Journal of the Acoustical Society of America.

[16]  Tammo Houtgast,et al.  The combined effects of reverberation and nonstationary noise on sentence intelligibility. , 2008, The Journal of the Acoustical Society of America.

[17]  Frederick J. Gallun,et al.  Exploring the Role of the Modulation Spectrum in Phoneme Recognition , 2008, Ear and hearing.

[18]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[19]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.