Evaluating a distortion-weighted glimpsing metric for predicting binaural speech intelligibility in rooms

A distortion-weighted glimpse proportion metric (BiDWGP) for predicting binaural speech intelligibility were evaluated in simulated anechoic and reverberant conditions, with and without a noise masker. The predictive performance of BiDWGP was compared to four reference binaural intelligibility metrics, which were extended from the Speech Intelligibility Index (SII) and the Speech Transmission Index (STI). In the anechoic sound field, BiDWGP demonstrated high accuracy in predicting binaural intelligibility for individual maskers (ź ź 0.95) and across maskers (ź ź 0.94). The reference metrics however performed less well in across-masker prediction (0.54 ź ź ź 0.86) despite reasonable accuracy for individual maskers. In reverberant rooms, BiDWGP was more stable in all test conditions (ź ź 0.87) than the reference metrics, which showed different predictive patterns: the binaural STIs were more robust for the stationary than for the fluctuating noise masker, whilst the binaural SII displayed the opposite behaviour. The study shows that the new BiDWGP metric can provide similar or even more robust predictive power than the current standard metrics.

[1]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[2]  R. Drullman,et al.  Binaural intelligibility prediction based on the speech transmission index. , 2008, The Journal of the Acoustical Society of America.

[3]  L. Rabiner,et al.  Predicting binaural gain in intelligibility and release from masking for speech. , 1967, Journal of the Acoustical Society of America.

[4]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[5]  H S Colburn,et al.  Theory of binaural interaction based on auditory-nerve data. II. Detection of tones in noise. , 1977, The Journal of the Acoustical Society of America.

[6]  T. Houtgast,et al.  The Modulation Transfer Function in Room Acoustics as a Predictor of Speech Intelligibility , 1973 .

[7]  B Kollmeier,et al.  Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. , 1996, The Journal of the Acoustical Society of America.

[8]  Klaus Hartung,et al.  Comparison of Different Methods for the Interpolation of Head-Related Transfer Functions , 1999 .

[9]  Craig T. Jin,et al.  Creating the Sydney York Morphological and Acoustic Recordings of Ears Database , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[10]  Jesper Jensen,et al.  Speech Intelligibility Prediction Based on Mutual Information , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Birger Kollmeier,et al.  Prediction of the influence of reverberation on binaural speech intelligibility in noise and in quiet. , 2011, The Journal of the Acoustical Society of America.

[13]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[14]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[15]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[16]  Cassia Valentini-Botinhao,et al.  Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech , 2016, Comput. Speech Lang..

[17]  Kuldip K. Paliwal,et al.  Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio , 2012, Speech Commun..

[18]  Stefano Cosentino,et al.  Objective speech intelligibility measurement for cochlear implant users in complex listening environments , 2013, Speech Commun..

[19]  Mathieu Lavandier,et al.  Revision and validation of a binaural model for speech intelligibility in noise , 2011, Hearing Research.

[20]  Birger Kollmeier,et al.  Revision, extension, and evaluation of a binaural speech intelligibility model. , 2010, The Journal of the Acoustical Society of America.

[21]  T. Houtgast,et al.  Predicting speech intelligibility in rooms from the modulation transfer function, I. General room acoustics , 1980 .

[22]  Ruth Y Litovsky,et al.  The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources. , 2004, The Journal of the Acoustical Society of America.

[23]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[24]  Phillip A. Yantis,et al.  Acoustical Factors Affecting Hearing Aid Performance , 1981 .

[25]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[26]  Steven van de Par,et al.  A Computationally-Efficient and Perceptually-Plausible Algorithm for Binaural Room Impulse Response Simulation , 2014 .

[27]  Lauri Savioja,et al.  Overview of geometrical room acoustic modeling techniques. , 2015, The Journal of the Acoustical Society of America.

[28]  K. Payton,et al.  Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data. , 2013, The Journal of the Acoustical Society of America.

[29]  Mathieu Lavandier,et al.  Speech segregation in rooms: monaural, binaural, and interacting effects of reverberation on target and interferer. , 2008, The Journal of the Acoustical Society of America.

[30]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[31]  B C Moore,et al.  The shape of the ear's temporal window. , 1988, The Journal of the Acoustical Society of America.

[32]  Bruno Fazenda,et al.  A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions , 2015, INTERSPEECH.

[33]  J. Culling,et al.  Speech segregation in rooms: effects of reverberation on both target and interferer. , 2007, The Journal of the Acoustical Society of America.

[34]  Marinus M. Boone,et al.  Evaluation of a Speech-Based and Binaural Speech Transmission Index , 2010 .

[35]  Ruth Y. Litovsky,et al.  Erratum: The role head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources [J. Acoust. Soc. Am. 116, 1057 (2004)] , 2005 .

[36]  Yan Tang Speech intelligibility enhancement and glimpse-based intelligibility models for known noise conditions , 2014 .

[37]  Gerald A. Studebaker,et al.  Acoustical Factors Affecting Hearing Aid Performance , 1992 .

[38]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[39]  Monika Rychtarikova,et al.  Sound source localisation and speech intelligibility in virtual rooms , 2011 .

[40]  G. Studebaker A "rationalized" arcsine transform. , 1985, Journal of speech and hearing research.

[41]  John F. Culling,et al.  Effects of simulated reverberation on the use of binaural cues and fundamental-frequency differences for separating concurrent vowels , 1994, Speech Commun..

[42]  Jesper Jensen,et al.  A binaural short time objective intelligibility measure for noisy and enhanced speech , 2015, INTERSPEECH.

[43]  P. Peterson Simulating the response of multiple microphones to a single acoustic source in a reverberant room. , 1986, The Journal of the Acoustical Society of America.

[44]  E. Shaw,et al.  Transformation of sound-pressure level from the free field to the eardrum presented in numerical form. , 1985, The Journal of the Acoustical Society of America.