Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features.

Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models.

[1]  Stuart Rosen,et al.  Listening to speech in a background of other talkers: effects of talker number and noise vocoding. , 2013, The Journal of the Acoustical Society of America.

[2]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[3]  Birger Kollmeier,et al.  Development and analysis of an International Speech Test Signal (ISTS) , 2010, International journal of audiology.

[4]  G. Kidd,et al.  The effect of spatial separation on informational masking of speech in normal-hearing and hearing-impaired listeners. , 2005, The Journal of the Acoustical Society of America.

[5]  Douglas S Brungart,et al.  Cocktail party listening in a dynamic multitalker environment , 2007, Perception & psychophysics.

[6]  Brian C J Moore,et al.  Speech perception problems of the hearing impaired reflect inability to use temporal fine structure , 2006, Proceedings of the National Academy of Sciences.

[7]  Birger Kollmeier,et al.  Tools to predict binaural speech intelligibility in complex listening environments for normal and hearing‐impaired listeners. , 2011 .

[8]  Irwin Pollack,et al.  Auditory informational masking , 1975 .

[9]  S. Rosen,et al.  Uncomodulated glimpsing in "checkerboard" noise. , 1993, The Journal of the Acoustical Society of America.

[10]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[11]  B. Shinn-Cunningham Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.

[12]  Michael A. Akeroyd,et al.  Variations in the Slope of the Psychometric Functions for Speech Intelligibility: A Systematic Survey , 2014, Trends in hearing.

[13]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[14]  B. Shinn-Cunningham,et al.  Informational masking: counteracting the effects of stimulus uncertainty by decreasing target-masker similarity. , 2003, The Journal of the Acoustical Society of America.

[15]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  T. Houtgast,et al.  The concept of signal-to-noise ratio in the modulation domain and speech intelligibility. , 2008, The Journal of the Acoustical Society of America.

[17]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[18]  Torsten Dau,et al.  Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. , 2011, The Journal of the Acoustical Society of America.

[19]  A S Bregman,et al.  Auditory grouping based on fundamental frequency and formant peak frequency. , 1990, Canadian journal of psychology.

[20]  R A Lutfi,et al.  How much masking is informational masking? , 1990, The Journal of the Acoustical Society of America.

[21]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[22]  Torsten Dau,et al.  Modeling comodulation masking release using an equalization-cancellation mechanism. , 2005, The Journal of the Acoustical Society of America.

[23]  Jon Barker,et al.  Modelling speaker intelligibility in noise , 2007, Speech Commun..

[24]  Torsten Dau,et al.  Modeling within- and across-channel processes in comodulation masking release. , 2013, The Journal of the Acoustical Society of America.

[25]  Joseph W. Hall,et al.  Detection in noise by spectro-temporal pattern analysis. , 1984, The Journal of the Acoustical Society of America.

[26]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[27]  B. Shinn-Cunningham,et al.  Note on informational masking (L) , 2003 .

[28]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[29]  Birger Kollmeier,et al.  Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. , 2002, The Journal of the Acoustical Society of America.

[30]  Torsten Dau,et al.  A multi-resolution envelope-power based model for speech intelligibility. , 2013, The Journal of the Acoustical Society of America.

[31]  Lynn R. Gilbertson,et al.  The information-divergence hypothesis of informational masking. , 2013, The Journal of the Acoustical Society of America.

[32]  Guy J. Brown,et al.  Separation of Speech by Computational Auditory Scene Analysis , 2005 .

[33]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[34]  Informational masking in normal-hearing and hearing-impaired listeners. , 2000, Acta oto-laryngologica.

[35]  T. Houtgast Frequency selectivity in amplitude-modulation detection. , 1989, The Journal of the Acoustical Society of America.

[36]  Thomas Brand,et al.  Comparison of Different Short-Term Speech Intelligibility Index Procedures in Fluctuating Noise for Listeners with Normal and Impaired Hearing , 2013 .

[37]  Brian C J Moore,et al.  Notionally steady background noise acts primarily as a modulation masker of speech. , 2012, The Journal of the Acoustical Society of America.