An Evaluation of Intrusive Instrumental Intelligibility Metrics

Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and <inline-formula><tex-math notation="LaTeX">$\text{sEPSM}^\text{corr}$</tex-math></inline-formula>. In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, preprocessing enhancement, and postprocessing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of <inline-formula> <tex-math notation="LaTeX">$\bf \rho =0.92$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX"> $\bf \rho =0.89$</tex-math></inline-formula>, respectively. The high performance of SIIB may, in part, be the result of SIIBs developers having access to all the intelligibility data considered in the evaluation. The results show that intelligibility metrics tend to perform poorly on datasets that were not used during their development. By modifying the original implementations of SIIB and STOI, the advantage of reducing statistical dependencies between input features is demonstrated. Additionally, this paper presents a new version of SIIB called <inline-formula> <tex-math notation="LaTeX">$\text{SIIB}^\text{Gauss}$</tex-math></inline-formula>, which has similar performance to SIIB and HASPI, but takes less time to compute by two orders of magnitude.

[1]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[2]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[3]  Wouter A. Dreschler,et al.  ICRA Noises: Artificial Noise Signals with Speech-like Spectral and Temporal Properties for Hearing Instrument Assessment: Ruidos ICRA: Señates de ruido artificial con espectro similar al habla y propiedades temporales para pruebas de instrumentos auditivos , 2001 .

[4]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[5]  Torsten Dau,et al.  Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. , 2011, The Journal of the Acoustical Society of America.

[6]  Fei Chen,et al.  Predicting the intelligibility of vocoded and wideband Mandarin Chinese. , 2011, The Journal of the Acoustical Society of America.

[7]  Torsten Dau,et al.  Prediction of speech intelligibility based on an auditory preprocessing model , 2010, Speech Commun..

[8]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[9]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[10]  A. Oxenham,et al.  Forward masking: adaptation or integration? , 2001, The Journal of the Acoustical Society of America.

[11]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[12]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[13]  Peter Vary,et al.  Near End Listening Enhancement: Speech Intelligibility Improvement in Noisy Environments , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Kuldip K. Paliwal,et al.  An improved speech transmission index for intelligibility prediction , 2014, Speech Commun..

[15]  Jont B. Allen,et al.  Articulation and Intelligibility , 2005, Synthesis Lectures on Speech and Audio Processing.

[16]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[17]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[18]  Richard C. Hendriks,et al.  On the information rate of speech communication , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[20]  Jon Barker,et al.  Modelling speaker intelligibility in noise , 2007, Speech Commun..

[21]  Tiago H. Falk,et al.  An improved non-intrusive intelligibility metric for noisy and reverberant speech , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[22]  B Hagerman,et al.  Sentences for testing speech intelligibility in noise. , 1982, Scandinavian audiology.

[23]  Astrid van Wieringen,et al.  Development of a Dutch matrix sentence test to assess speech intelligibility in noise , 2014, International journal of audiology.

[24]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[25]  Jesper Jensen,et al.  On Optimal Linear Filtering of Speech for Near-End Listening Enhancement , 2013, IEEE Signal Processing Letters.

[26]  Torsten Dau,et al.  A multi-resolution envelope-power based model for speech intelligibility. , 2013, The Journal of the Acoustical Society of America.

[27]  James M. Kates,et al.  Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices: Advantages and limitations of existing tools , 2015, IEEE Signal Processing Magazine.

[28]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Jesper Jensen,et al.  On Predicting the Difference in Intelligibility Before and After Single-Channel Noise Reduction , 2010 .

[30]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.

[31]  G A Studebaker,et al.  Frequency-importance and transfer functions for recorded CID W-22 word lists. , 1991, Journal of speech and hearing research.

[32]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[33]  Keisuke Kinoshita,et al.  Improving syllable identification by a preprocessing method reducing overlap-masking in reverberant environments. , 2006, The Journal of the Acoustical Society of America.

[34]  G. Carter,et al.  Estimation of the magnitude-squared coherence function via overlapped fast Fourier transform processing , 1973 .

[35]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[36]  Jesper Jensen,et al.  Minimum Mean-Square Error Estimation of Discrete Fourier Coefficients With Generalized Gamma Priors , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  C Ludvigsen,et al.  Evaluation of a noise reduction method--comparison between observed scores and scores predicted from STI. , 1993, Scandinavian audiology. Supplementum.

[38]  Torsten Dau,et al.  Predicting speech intelligibility based on a correlation metric in the envelope power spectrum domain. , 2016, The Journal of the Acoustical Society of America.

[39]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[40]  Rainer Martin,et al.  Objective Intelligibility Measures Based on Mutual Information for Speech Subjected to Speech Enhancement Processing , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  H. Fletcher,et al.  The Perception of Speech and Its Relation to Telephony , 1950 .

[42]  Yonghong Yan,et al.  Evaluation of objective intelligibility prediction measures for noise-reduced signals in mandarin , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[43]  Yan Tang,et al.  Glimpse-Based Metrics for Predicting Speech Intelligibility in Additive Noise Conditions , 2016, INTERSPEECH.

[44]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[45]  Philipos C Loizou,et al.  Extending the articulation index to account for non-linear distortions introduced by noise-suppression algorithms. , 2011, The Journal of the Acoustical Society of America.

[46]  Richard C. Hendriks,et al.  A Simple Model of Speech Communication and its Application to Intelligibility Enhancement , 2015, IEEE Signal Processing Letters.

[47]  W. Dreschler,et al.  ICRA noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. International Collegium for Rehabilitative Audiology. , 2001, Audiology : official organ of the International Society of Audiology.

[48]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[49]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[50]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Jesper Jensen,et al.  Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Sigfrid D Soli,et al.  Development of the Cantonese speech intelligibility index. , 2007, The Journal of the Acoustical Society of America.

[53]  Jesper Jensen,et al.  A non-intrusive Short-Time Objective Intelligibility measure , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[55]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[56]  Richard C. Hendriks,et al.  An Instrumental Intelligibility Metric Based on Information Theory , 2017, IEEE Signal Processing Letters.

[57]  Jesper Jensen,et al.  An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[58]  Richard Heusdens,et al.  Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure , 2014, Comput. Speech Lang..

[59]  James M. Kates,et al.  The Hearing-Aid Speech Perception Index (HASPI) , 2014, Speech Commun..

[60]  Kari Karhunen,et al.  Über lineare Methoden in der Wahrscheinlichkeitsrechnung , 1947 .

[61]  Jesper Jensen,et al.  Optimal Near-End Speech Intelligibility Improvement Incorporating Additive Noise and Late Reverberation Under an Approximation of the Short-Time SII , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[62]  S. Gordon-Salant,et al.  Comparing recognition of distorted speech using an equivalent signal-to-noise ratio index. , 1995, Journal of Speech and Hearing Research.

[63]  Cassia Valentini-Botinhao,et al.  Intelligibility-enhancing speech modifications: the hurricane challenge , 2020, INTERSPEECH.

[64]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[65]  Yi Hu,et al.  A Hilbert-fine-structure-derived physical metric for predicting the intelligibility of noise-distorted and noise-suppressed speech , 2013, Speech Commun..

[66]  Torsten Dau,et al.  Speech Intelligibility Evaluation for Mobile Phones. , 2015 .

[67]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[68]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[69]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[70]  In-Ki Jin Development of the Speech Intelligibility Index (SII) for Korean , 2014 .

[71]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[72]  Jesper Jensen,et al.  Speech Intelligibility Prediction Based on Mutual Information , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[73]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[74]  Richard C. Hendriks,et al.  Intelligibility Enhancement Based on Mutual Information , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[75]  R. L. Wegel,et al.  The Auditory Masking of One Pure Tone by Another and its Probable Relation to the Dynamics of the Inner Ear , 1924 .