A multi-resolution envelope-power based model for speech intelligibility.

The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well for changes of speech intelligibility for normal-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. In the latter condition, the standardized speech transmission index [(2003). IEC 60268-16] fails. However, the sEPSM is limited to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv is a powerful objective metric for speech intelligibility prediction.

[1]  G. A. Miller,et al.  The masking of speech. , 1947, Psychological bulletin.

[2]  G. A. Miller,et al.  The Intelligibility of Interrupted Speech , 1948 .

[3]  D. Dirks,et al.  Effect of pulsed masking on selected speech materials. , 1969, The Journal of the Acoustical Society of America.

[4]  A. M. Mimpen,et al.  Improving the reliability of testing the speech reception threshold for sentences. , 1979, Audiology : official organ of the International Society of Audiology.

[5]  T. Houtgast,et al.  Predicting speech intelligibility in rooms from the modulation transfer function, I. General room acoustics , 1980 .

[6]  B Hagerman,et al.  Sentences for testing speech intelligibility in noise. , 1982, Scandinavian audiology.

[7]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[8]  R. Plomp,et al.  Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.

[9]  S. Rosen,et al.  Uncomodulated glimpsing in "checkerboard" noise. , 1993, The Journal of the Acoustical Society of America.

[10]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[11]  T. Dau,et al.  Intrinsic envelope fluctuations and modulation-detection thresholds for narrow-band noise carriers. , 1996, The Journal of the Acoustical Society of America.

[12]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. , 1997, The Journal of the Acoustical Society of America.

[13]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[14]  B. Kollmeier,et al.  Within-channel cues in comodulation masking release (CMR): experiments and model predictions using a modulation-filterbank model. , 1999, The Journal of the Acoustical Society of America.

[15]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[16]  Kohlrausch,et al.  The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers , 2000, The Journal of the Acoustical Society of America.

[17]  H Müsch,et al.  Using statistical decision theory to predict speech intelligibility. I. Model structure. , 2001, The Journal of the Acoustical Society of America.

[18]  T. Dau,et al.  Spectro-temporal processing in the envelope-frequency domain. , 2002, The Journal of the Acoustical Society of America.

[19]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[20]  K. Wagener,et al.  Design, optimization and evaluation of a Danish sentence test in noise: Diseño, optimización y evaluación de la prueba Danesa de frases en ruido , 2003, International journal of audiology.

[21]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[22]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[23]  T. Houtgast,et al.  Factors affecting masking release for speech in modulated noise for normal-hearing and hearing-impaired listeners. , 2006, The Journal of the Acoustical Society of America.

[24]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[25]  Tammo Houtgast,et al.  The combined effects of reverberation and nonstationary noise on sentence intelligibility. , 2008, The Journal of the Acoustical Society of America.

[26]  T. Houtgast,et al.  The concept of signal-to-noise ratio in the modulation domain and speech intelligibility. , 2008, The Journal of the Acoustical Society of America.

[27]  Torsten Dau,et al.  Relations between frequency selectivity, temporal fine-structure processing, and speech reception in impaired hearing. , 2009, The Journal of the Acoustical Society of America.

[28]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[29]  N. J. Versfeld,et al.  The dynamic range of speech, compression, and its effect on the speech reception threshold in stationary and interrupted noise. , 2009, The Journal of the Acoustical Society of America.

[30]  Torsten Dau,et al.  Development of a Danish speech intelligibility test , 2009, International journal of audiology.

[31]  Emily Buss,et al.  Masking release for words in amplitude-modulated noise as a function of modulation rate and task. , 2009, The Journal of the Acoustical Society of America.

[32]  Birger Kollmeier,et al.  Development and analysis of an International Speech Test Signal (ISTS) , 2010, International journal of audiology.

[33]  Torsten Dau,et al.  Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. , 2011, The Journal of the Acoustical Society of America.