Speech Intelligibility Prediction Based on Mutual Information

This paper deals with the problem of predicting the average intelligibility of noisy and potentially processed speech signals, as observed by a group of normal hearing listeners. We propose a model which performs this prediction based on the hypothesis that intelligibility is monotonically related to the mutual information between critical-band amplitude envelopes of the clean signal and the corresponding noisy/processed signal. The resulting intelligibility predictor turns out to be a simple function of the mean-square error (mse) that arises when estimating a clean critical-band amplitude using a minimum mean-square error (mmse) estimator based on the noisy/processed amplitude. The proposed model predicts that speech intelligibility cannot be improved by any processing of noisy critical-band amplitudes. Furthermore, the proposed intelligibility predictor performs well ( ρ > 0.95) in predicting the intelligibility of speech signals contaminated by additive noise and potentially non-linearly processed using time-frequency weighting.

[1]  Torsten Dau,et al.  Prediction of speech intelligibility based on an auditory preprocessing model , 2010, Speech Commun..

[2]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[3]  R. Drullman Temporal envelope and fine structure cues for speech intelligibility , 1994 .

[4]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[5]  Jesper Jensen,et al.  An evaluation of objective quality measures for speech intelligibility prediction , 2009, INTERSPEECH.

[6]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[7]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[8]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[9]  B Kollmeier,et al.  The effect of multichannel dynamic compression on speech intelligibility. , 1995, The Journal of the Acoustical Society of America.

[10]  Jont B. Allen,et al.  The Articulation Index is a Shannon channel capacity , 2005 .

[11]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[12]  Rainer Martin,et al.  On mutual information as a measure of speech intelligibility , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Charles W. Therrien,et al.  Discrete Random Signals and Statistical Signal Processing , 1992 .

[14]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  C Ludvigsen,et al.  Evaluation of a noise reduction method--comparison between observed scores and scores predicted from STI. , 1993, Scandinavian audiology. Supplementum.

[16]  J. N. Kapur Maximum-entropy models in science and engineering , 1992 .

[17]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[18]  David R. Brillinger,et al.  Time Series: Data Analysis and Theory. , 1982 .

[19]  K. D. Kryter Methods for the Calculation and Use of the Articulation Index , 1962 .

[20]  Daniel P. W. Ellis,et al.  A simple correlation-based model of intelligibility for nonlinear speech enhancement and separation , 2009, 2009 17th European Signal Processing Conference.

[21]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[22]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[23]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[24]  Yi Hu,et al.  A Comparative Intelligibility Study of Speech Enhancement Algorithms , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[26]  Torsten Dau,et al.  Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. , 2011, The Journal of the Acoustical Society of America.

[27]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Jesper Jensen,et al.  Spectral Magnitude Minimum Mean-Square Error Estimation Using Binary and Continuous Gain Functions , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[30]  Arne Leijon Articulation Index and Shannon Mutual Information , 2007 .

[31]  Jesper Jensen,et al.  On Predicting the Difference in Intelligibility Before and After Single-Channel Noise Reduction , 2010 .

[32]  R Drullman,et al.  Temporal envelope and fine structure cues for speech intelligibility. , 1994, The Journal of the Acoustical Society of America.

[33]  William Bialek,et al.  Bits and brains: Information flow in the nervous system , 1993 .

[34]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.

[35]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[36]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.