Outcome measures based on classification performance fail to predict the intelligibility of binary-masked speech.

To date, the most commonly used outcome measure for assessing ideal binary mask estimation algorithms is based on the difference between the hit rate and the false alarm rate (H-FA). Recently, the error distribution has been shown to substantially affect intelligibility. However, H-FA treats each mask unit independently and does not take into account how errors are distributed. Alternatively, algorithms can be evaluated with the short-time objective intelligibility (STOI) metric using the reconstructed speech. This study investigates the ability of H-FA and STOI to predict intelligibility for binary-masked speech using masks with different error distributions. The results demonstrate the inability of H-FA to predict the behavioral intelligibility and also illustrate the limitations of STOI. Since every estimation algorithm will make errors that are distributed in different ways, performance evaluations should not be made solely on the basis of these metrics.

[1]  Chengzhu Yu,et al.  Evaluation of the importance of time-frequency contributions to speech intelligibility in noise. , 2014, The Journal of the Acoustical Society of America.

[2]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[3]  DeLiang Wang,et al.  A classification based approach to speech segregation. , 2012, The Journal of the Acoustical Society of America.

[4]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[5]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[6]  Adam Westermann,et al.  Cochlear implant speech intelligibility outcomes with structured and unstructured binary mask errors. , 2016, The Journal of the Acoustical Society of America.

[7]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[8]  Tim Brookes,et al.  Ideal Binary Mask Ratio: A Novel Metric for Assessing Binary-Mask-Based Sound Source Separation Algorithms , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Torsten Dau,et al.  Computational speech segregation based on an auditory-inspired modulation analysis. , 2014, The Journal of the Acoustical Society of America.

[10]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[11]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[12]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Torsten Dau,et al.  Environment-aware ideal binary mask estimation using monaural cues , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Torsten Dau,et al.  Prediction of speech intelligibility based on an auditory preprocessing model , 2010, Speech Commun..

[15]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[16]  Guy J. Brown,et al.  Speech segregation based on sound localization , 2003 .

[17]  DeLiang Wang,et al.  An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type. , 2015, The Journal of the Acoustical Society of America.

[18]  Yi Hu,et al.  A new sound coding strategy for suppressing noise in cochlear implants. , 2008, The Journal of the Acoustical Society of America.

[19]  DeLiang Wang,et al.  An algorithm to improve speech recognition in noise for hearing-impaired listeners. , 2013, The Journal of the Acoustical Society of America.

[20]  Christopher J Rozell,et al.  Structure in time-frequency binary masking errors and its impact on speech intelligibility. , 2015, The Journal of the Acoustical Society of America.