A non-intrusive Short-Time Objective Intelligibility measure

We propose a non-intrusive intelligibility measure for noisy and non-linearly processed speech, i.e. a measure which can predict intelligibility from a degraded speech signal without requiring a clean reference signal. The proposed measure is based on the Short-Time Objective Intelligibility (STOI) measure. In particular, the non-intrusive STOI measure estimates clean signal amplitude envelopes from the degraded signal. Subsequently, the STOI measure is evaluated by use of the envelopes of the degraded signal and the estimated clean envelopes. The performance of the proposed measure is evaluated on a dataset including speech in different noise types, processed with binary masks. The measure is shown to predict intelligibility well in all tested conditions, with the exception of those including a single competing speaker. While the measure does not perform as well as the original (intrusive) STOI measure, it is shown to outperform existing non-intrusive measures.

[1]  Arne Leijon,et al.  Comparison of predictive measures of speech recognition after noise reduction processing. , 2014, The Journal of the Acoustical Society of America.

[2]  Torsten Dau,et al.  Speech Intelligibility Evaluation for Mobile Phones. , 2015 .

[3]  R. Beutelmann,et al.  Prediction of speech intelligibility in spatial noise and reverberation for normal-hearing and hearing-impaired listeners. , 2006, The Journal of the Acoustical Society of America.

[4]  H. Steeneken,et al.  THE SPEECH TRANSMISSION INDEX AFTER FOUR DECADES OF DEVELOPMENT 1 , 2012 .

[5]  Jesper Jensen,et al.  Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Birger Kollmeier,et al.  Efficient adaptive procedures for threshold and concurrent slope estimates for psychophysics and speech intelligibility tests. , 2002, The Journal of the Acoustical Society of America.

[7]  Tiago H. Falk,et al.  An improved non-intrusive intelligibility metric for noisy and reverberant speech , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[8]  K. S. Rhebergen,et al.  Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. , 2006, The Journal of the Acoustical Society of America.

[9]  Richard Heusdens,et al.  Matching pursuit for channel selection in cochlear implants based on an intelligibility metric , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[10]  Birger Kollmeier,et al.  Revision, extension, and evaluation of a binaural speech intelligibility model. , 2010, The Journal of the Acoustical Society of America.

[11]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[12]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[13]  Jesper Jensen,et al.  An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech. , 2011, The Journal of the Acoustical Society of America.

[14]  Dorothea Kolossa,et al.  Twin-HMM-based non-intrusive speech intelligibility prediction , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Jesper Jensen,et al.  An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Torsten Dau,et al.  A multi-resolution envelope-power based model for speech intelligibility. , 2013, The Journal of the Acoustical Society of America.

[17]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[18]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[19]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[20]  James M. Kates,et al.  Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices: Advantages and limitations of existing tools , 2015, IEEE Signal Processing Magazine.

[21]  Mike Brookes,et al.  A weighted STOI intelligibility metric based on mutual information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[23]  K. Wagener,et al.  Design, optimization and evaluation of a Danish sentence test in noise: Diseño, optimización y evaluación de la prueba Danesa de frases en ruido , 2003, International journal of audiology.

[24]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[26]  Mads Græsbøll Christensen,et al.  Semi-non-intrusive objective intelligibility measure using spatial filtering in hearing aids , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[27]  Philipos C. Loizou,et al.  Predicting the intelligibility of reverberant speech for cochlear implant listeners with a non-intrusive intelligibility measure , 2013, Biomed. Signal Process. Control..

[28]  Mike Brookes,et al.  A data-driven non-intrusive measure of speech quality and intelligibility , 2016, Speech Commun..

[29]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.