Intelligibility prediction for speech mixed with white Gaussian noise at low signal-to-noise ratios.

The effect of additive white Gaussian noise and high-pass filtering on speech intelligibility at signal-to-noise ratios (SNRs) from -26 to 0 dB was evaluated using British English talkers and normal hearing listeners. SNRs below -10 dB were considered as they are relevant to speech security applications. Eight objective metrics were assessed: short-time objective intelligibility (STOI), a proposed variant termed STOI+, extended short-time objective intelligibility (ESTOI), normalised covariance metric (NCM), normalised subband envelope correlation metric (NSEC), two metrics derived from the coherence speech intelligibility index (CSII), and an envelope-based regression method speech transmission index (STI). For speech and noise mixtures associated with intelligibility scores ranging from 0% to 98%, STOI+ performed at least as well as other metrics and, under some conditions, better than STOI, ESTOI, STI, NSEC, CSIIMid, and CSIIHigh. Both STOI+ and NCM were associated with relatively low prediction error and bias for intelligibility prediction at SNRs from -26 to 0 dB. STI performed least well in terms of correlation with intelligibility scores, prediction error, bias, and reliability. Logistic regression modeling demonstrated that high-pass filtering, which increases the proportion of high to low frequency energy, was detrimental to intelligibility for SNRs between -5 and -17 dB inclusive.

[1]  Jesper Jensen,et al.  Monaural Speech Enhancement Using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jesper Jensen,et al.  An evaluation of objective quality measures for speech intelligibility prediction , 2009, INTERSPEECH.

[3]  K. Payton,et al.  Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data. , 2013, The Journal of the Acoustical Society of America.

[4]  Richard Heusdens,et al.  Matching pursuit for channel selection in cochlear implants based on an intelligibility metric , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[5]  Tim Jackson,et al.  Thresholds of information leakage for speech security outside meeting rooms. , 2014, The Journal of the Acoustical Society of America.

[6]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[7]  Mike Brookes,et al.  Effects of noise suppression on intelligibility. II: An attempt to validate physical metrics. , 2014, The Journal of the Acoustical Society of America.

[8]  Martin Cooke,et al.  The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise , 2009, Speech Commun..

[9]  Jesper Jensen,et al.  A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Jesper Jensen,et al.  Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  James M. Kates,et al.  Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices: Advantages and limitations of existing tools , 2015, IEEE Signal Processing Magazine.

[12]  John G. Harris,et al.  Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments , 2006, Speech Commun..

[13]  J. S. Bradley,et al.  Measures for assessing architectural speech security (privacy) of closed offices and meeting rooms. , 2004, The Journal of the Acoustical Society of America.

[14]  Mike Brookes,et al.  SOBM - a binary mask for noisy speech that optimises an objective intelligibility metric , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  Jen-Tzung Chien,et al.  Modulation Wiener filter for improving speech intelligibility , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Jesper Jensen,et al.  An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  D. Cabrera,et al.  Influence of time-varying talker directivity on the calculation of speech transmission index from speech in a room acoustical context , 2014 .

[20]  Bruno Fazenda,et al.  Evaluating a distortion-weighted glimpsing metric for predicting binaural speech intelligibility in rooms , 2016, Speech Commun..

[21]  Cassia Valentini-Botinhao,et al.  Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech , 2016, Comput. Speech Lang..

[22]  R. Plomp,et al.  Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.

[23]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[24]  Yonghong Yan,et al.  Evaluation of objective intelligibility prediction measures for noise-reduced signals in mandarin , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  W. Bastiaan Kleijn,et al.  An Evaluation of Intrusive Instrumental Intelligibility Metrics , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  Jesper Jensen,et al.  Speech Intelligibility Prediction Based on Mutual Information , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Daniel P. W. Ellis,et al.  A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation , 2009, 2009 17th European Signal Processing Conference.

[28]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[29]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[30]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[31]  R. Niederjohn,et al.  The enhancement of speech intelligibility in high noise levels by high-pass filtering followed by rapid amplitude compression , 1976 .

[32]  Densil Cabrera,et al.  Audio and Acoustical Response Analysis Environment (AARAE): a tool to support education and research in acoustics , 2014 .

[33]  Jont B. Allen,et al.  Consonant confusions in white noise. , 2008, The Journal of the Acoustical Society of America.