Predicting Binaural Speech Intelligibility from Signals Estimated by a Blind Source Separation Algorithm

State-of-the-art binaural objective intelligibility measures (OIMs) require individual source signals for making intelligibility predictions, limiting their usability in real-time online operations. This limitation may be addressed by a blind source separation (BSS) process, which is able to extract the underlying sources from a mixture. In this study, a speech source is presented with either a stationary noise masker or a fluctuating noise masker whose azimuth varies in a horizontal plane, at two speech-to-noise ratios (SNRs). Three binaural OIMs are used to predict speech intelligibility from the signals separated by a BSS algorithm. The model predictions are compared with listeners' word identification rate in a perceptual listening experiment. The results suggest that with SNR compensation to the BSS-separated speech signal, the OIMs can maintain their predictive power for individual maskers compared to their performance measured from the direct signals. It also reveals that the errors in SNR between the estimated signals are not the only factors that decrease the predictive accuracy of the OIMs with the separated signals. Artefacts or distortions on the estimated signals caused by the BSS algorithm may also be concerns.

[1]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Brian C. J. Moore,et al.  Temporal integration and context effects in hearing , 2003, J. Phonetics.

[3]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Tiago H. Falk,et al.  Updating the SRMR-CI Metric for Improved Intelligibility Prediction for Cochlear Implant Users , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  R. Drullman,et al.  Binaural intelligibility prediction based on the speech transmission index. , 2008, The Journal of the Acoustical Society of America.

[6]  Ruth Y Litovsky,et al.  The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources. , 2004, The Journal of the Acoustical Society of America.

[7]  Bruno Fazenda,et al.  A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions , 2015, INTERSPEECH.

[8]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[9]  Ruth Y. Litovsky,et al.  Erratum: The role head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources [J. Acoust. Soc. Am. 116, 1057 (2004)] , 2005 .

[10]  Phillip A. Yantis,et al.  Acoustical Factors Affecting Hearing Aid Performance , 1981 .

[11]  Richard Heusdens,et al.  Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure , 2014, Comput. Speech Lang..

[12]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Maximilian Bayer,et al.  Handbook For Sound Engineers , 2016 .

[14]  Atiyeh Alinaghi,et al.  Joint Mixing Vector and Binaural Model Based Stereo Source Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Josef Kittler,et al.  Source Separation of Convolutive and Noisy Mixtures Using Audio-Visual Dictionary Learning and Probabilistic Time-Frequency Masking , 2013, IEEE Transactions on Signal Processing.

[16]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.