Non-intrusive intelligibility prediction for Mandarin speech in noise

Most existing intelligibility indices require access to the input (clean) reference signal to predict speech intelligibility in noise. In some real-world applications, however, only the noise-masked speech is available, rendering existing indices of little use. The present study assessed the performance of an intelligibility measure that could be used to predict non-intrusively (i.e., with no access to the clean input signal) speech intelligibility in noise using only information extracted from the noise-masked speech envelopes. The proposed intelligibility measure (denoted as ModA) was computed by integrating the area of the modulation spectrum (within 0.5 Hz to 10 Hz) of the noise-masked envelopes extracted in four acoustic bands. The ModA measure was evaluated with intelligibility scores obtained by normal-hearing listeners presented with Mandarin sentences corrupted by three types of maskers. High correlation (r=0.90) was obtained between ModA values and listener's intelligibility scores, suggesting that the modulation-spectrum area could be potentially used as a simple but efficient predictor of speech intelligibility in noisy conditions.

[1]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[2]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[3]  Philipos C. Loizou,et al.  Predicting the intelligibility of reverberant speech for cochlear implant listeners with a non-intrusive intelligibility measure , 2013, Biomed. Signal Process. Control..

[4]  Fei Chen Predicting the intelligibility of cochlear-implant vocoded speech from objective quality measure , 2012 .

[5]  D. D. Greenwood A cochlear frequency-position function for several species--29 years later. , 1990, The Journal of the Acoustical Society of America.

[6]  Fei Chen,et al.  Predicting the intelligibility of vocoded and wideband Mandarin Chinese. , 2011, The Journal of the Acoustical Society of America.

[7]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[8]  T. Houtgast,et al.  The concept of signal-to-noise ratio in the modulation domain and speech intelligibility. , 2008, The Journal of the Acoustical Society of America.

[9]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[10]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[11]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[12]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  J. H. Steiger Tests for comparing elements of a correlation matrix. , 1980 .

[14]  G. Pfurtscheller,et al.  Neural network based classification of non-averaged event-related EEG responses , 1994, Medical and Biological Engineering and Computing.

[15]  Doh-Suk Kim,et al.  ANIQUE: An Auditory Model for Single-Ended Speech Quality Estimation , 2005, IEEE Trans. Speech Audio Process..