Structure in time-frequency binary masking errors and its impact on speech intelligibility.

Although requiring prior knowledge makes the ideal binary mask an impractical algorithm, substantial increases in measured intelligibility make it a desirable benchmark. While this benchmark has been studied extensively, many questions remain about the factors that influence the intelligibility of binary-masked speech with non-ideal masks. To date, researchers have used primarily uniformly random, uncorrelated mask errors and independently presented error types (i.e., false positives and negatives) to characterize the influence of estimation errors on intelligibility. However, practical estimation algorithms produce masks that contain errors of both types and with non-trivial amounts of structure. This paper introduces an investigation framework for binary masks and presents listener studies that use this framework to illustrate how interactions between error types and structure affect intelligibility. First, this study demonstrates that clustering (i.e., a form of structure) of mask errors reduces intelligibility. Furthermore, while previous research has suggested that false positives are more detrimental to intelligibility than false negatives, this study indicates that false negatives can be equally detrimental to intelligibility when they contain structure or when both error types are present. Finally, this study shows that listeners tolerate fewer mask errors when both types of errors are present, especially when the errors contain structure.

[1]  DeLiang Wang,et al.  Speech intelligibility in background noise with ideal binary time-frequency masking. , 2009, The Journal of the Acoustical Society of America.

[2]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  DeLiang Wang,et al.  Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[5]  Torsten Dau,et al.  Requirements for the evaluation of computational speech segregation systems. , 2014, The Journal of the Acoustical Society of America.

[6]  DeLiang Wang,et al.  An algorithm to improve speech recognition in noise for hearing-impaired listeners. , 2013, The Journal of the Acoustical Society of America.

[7]  Yi Hu,et al.  A comparative intelligibility study of single-microphone noise reduction algorithms. , 2007, The Journal of the Acoustical Society of America.

[8]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[9]  Yi Hu,et al.  Environment-specific noise suppression for improved speech intelligibility by cochlear implant users. , 2010, The Journal of the Acoustical Society of America.

[10]  Chengzhu Yu,et al.  Evaluation of the importance of time-frequency contributions to speech intelligibility in noise. , 2014, The Journal of the Acoustical Society of America.

[11]  Wouter A Dreschler,et al.  Perceptual effects of noise reduction by time-frequency masking of noisy speech. , 2012, The Journal of the Acoustical Society of America.

[12]  DeLiang Wang,et al.  Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design , 2008 .

[13]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[14]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[15]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[16]  DeLiang Wang,et al.  A classification based approach to speech segregation. , 2012, The Journal of the Acoustical Society of America.

[17]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[18]  Pam W. Dawson,et al.  Perceptually optimized gain function for cochlear implant signal-to-noise ratio based noise reduction. , 2012, The Journal of the Acoustical Society of America.

[19]  DeLiang Wang,et al.  Speech perception of noise with binary gains. , 2008, The Journal of the Acoustical Society of America.

[20]  Torsten Dau,et al.  Environment-aware ideal binary mask estimation using monaural cues , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[21]  David V. Anderson,et al.  A novel binary mask estimator based on sparse approximation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[23]  Philipos C. Loizou,et al.  Improving Speech Intelligibility in Noise Using Environment-Optimized Algorithms , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Philipos C. Loizou,et al.  Improving Speech Intelligibility in Noise Using a Binary Mask That Is Based on Magnitude Spectrum Constraints , 2010, IEEE Signal Processing Letters.