Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions

Abstract Speech intelligibility prediction methods have recently gained popularity in the speech processing community as supplements to time consuming and costly listening experiments. Such methods can be used to objectively quantify and compare the advantage of different speech enhancement algorithms, in a way that correlates well with actual speech intelligibility. One such method is the short-time objective intelligibility (STOI) measure. In a recent publication, we proposed a binaural version of the STOI measure, based on a modified version of the equalization cancellation (EC) model. This measure was shown to retain many of the advantageous properties of the STOI measure, while at the same time being able to predict intelligibility correctly in conditions involving both binaural advantage and non-linear signal processing. The biggest prediction errors were found for conditions involving multiple spatially distributed interferers. In this paper, we report results for a new listening experiment including different mixtures of isotropic and point source noise. This exposes that the binaural STOI measure has a tendency to overestimate the intelligibility in conditions with spatially distributed interferes at low signal to noise ratios (SNRs). This condition-dependent error can make it difficult to compare intelligibility across different acoustical conditions. We investigate the cause of this upward bias, and propose a correction which alleviates the problem. The modified method is evaluated with five datasets of measured intelligibility, spanning a wide range of realistic acoustic conditions. Within the tested conditions, the modified method yields very accurate predictions, and entirely alleviates the aforementioned tendency to overestimate intelligibility in conditions with spatially distributed interferers.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[3]  Birger Kollmeier,et al.  Revision, extension, and evaluation of a binaural speech intelligibility model. , 2010, The Journal of the Acoustical Society of America.

[4]  Torsten Dau,et al.  Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain. , 2016, The Journal of the Acoustical Society of America.

[5]  Søren Holdt Jensen,et al.  Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[7]  R Plomp,et al.  The effect of head-induced interaural time and level differences on speech intelligibility in noise. , 1987, The Journal of the Acoustical Society of America.

[8]  C Ludvigsen,et al.  Evaluation of a noise reduction method--comparison between observed scores and scores predicted from STI. , 1993, Scandinavian audiology. Supplementum.

[9]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[10]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[11]  Ruth Y. Litovsky,et al.  Erratum: The role head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources [J. Acoust. Soc. Am. 116, 1057 (2004)] , 2005 .

[12]  Harvey Fletcher,et al.  Articulation Testing Methods , 1930 .

[13]  Emanuel A. P. Habets,et al.  Dereverberation in noisy environments using reference signals and a maximum likelihood estimator , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[14]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.

[15]  Chabot-leclerc Alexandre PAMBOX: A Python auditory modeling toolbox , 2014 .

[16]  Jim Euchner Design , 2014, Catalysis from A to Z.

[17]  Torsten Dau,et al.  Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. , 2011, The Journal of the Acoustical Society of America.

[18]  Mathieu Lavandier,et al.  Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources. , 2012, The Journal of the Acoustical Society of America.

[19]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[20]  R. Beutelmann,et al.  Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. , 2006, The Journal of the Acoustical Society of America.

[21]  Torsten Dau,et al.  Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain. , 2016, The Journal of the Acoustical Society of America.

[22]  H. Steeneken,et al.  THE SPEECH TRANSMISSION INDEX AFTER FOUR DECADES OF DEVELOPMENT 1 , 2012 .

[23]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[24]  Marc Moonen,et al.  Binaural Noise Reduction Algorithms for Hearing Aids That Preserve Interaural Time Delay Cues , 2007, IEEE Transactions on Signal Processing.

[25]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[26]  K. Wagener,et al.  Design, optimization and evaluation of a Danish sentence test in noise: Diseño, optimización y evaluación de la prueba Danesa de frases en ruido , 2003, International journal of audiology.

[27]  Torsten Dau,et al.  A multi-resolution envelope-power based model for speech intelligibility. , 2013, The Journal of the Acoustical Society of America.

[28]  Jont B. Allen,et al.  The Articulation Index is a Shannon channel capacity , 2005 .

[29]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[30]  Jesper Jensen,et al.  An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  E. J. Williams The Comparison of Regression Variables , 1959 .

[32]  N. Durlach Equalization and Cancellation Theory of Binaural Masking‐Level Differences , 1963 .

[33]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[34]  D. Gilson Revision , 2020 .

[35]  R. Wilcox,et al.  Comparing Dependent Correlations , 2008, The Journal of general psychology.

[36]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[37]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[38]  James M. Kates,et al.  Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices: Advantages and limitations of existing tools , 2015, IEEE Signal Processing Magazine.

[39]  Jesper Jensen,et al.  A binaural short time objective intelligibility measure for noisy and enhanced speech , 2015, INTERSPEECH.

[40]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Jesper Jensen,et al.  Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[42]  Nathaniel I Durlach,et al.  Application of a short-time version of the Equalization-Cancellation model to speech intelligibility experiments with speech maskers. , 2014, The Journal of the Acoustical Society of America.

[43]  Mathieu Lavandier,et al.  Prediction of binaural speech intelligibility against noise in rooms. , 2010, The Journal of the Acoustical Society of America.

[44]  Mathieu Lavandier,et al.  Revision and validation of a binaural model for speech intelligibility in noise , 2011, Hearing Research.

[45]  G. A. Miller,et al.  The masking of speech. , 1947, Psychological bulletin.

[46]  Daniel P. W. Ellis,et al.  A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation , 2009, 2009 17th European Signal Processing Conference.

[47]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[48]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[49]  Ruth Y Litovsky,et al.  The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources. , 2004, The Journal of the Acoustical Society of America.

[50]  Ellen Raben Pedersen,et al.  User-operated speech in noise test: Implementation and comparison with a traditional test , 2014, International journal of audiology.

[51]  Birger Kollmeier,et al.  Development and analysis of an International Speech Test Signal (ISTS) , 2010, International journal of audiology.

[52]  Nathaniel I Durlach,et al.  Application of an extended equalization-cancellation model to speech intelligibility with spatially distributed maskers. , 2010, The Journal of the Acoustical Society of America.

[53]  Jesper Jensen,et al.  Speech Intelligibility Prediction Based on Mutual Information , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[54]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[55]  Arne Leijon,et al.  Comparison of predictive measures of speech recognition after noise reduction processing. , 2014, The Journal of the Acoustical Society of America.

[56]  Jesper Jensen,et al.  A method for predicting the intelligibility of noisy and non-linearly enhanced binaural speech , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  Torsten Dau,et al.  Speech Intelligibility Evaluation for Mobile Phones. , 2015 .

[58]  M. Cooke A glimpsing model of speech perception , 2003 .

[60]  N. I. Durlach,et al.  Binaural signal detection - Equalization and cancellation theory. , 1972 .