Evaluation of the importance of time-frequency contributions to speech intelligibility in noise.

Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures.

[1]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[2]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[3]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[4]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[5]  DeLiang Wang,et al.  Speech perception of noise with binary gains. , 2008, The Journal of the Acoustical Society of America.

[6]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[7]  Yi Hu,et al.  Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. , 2009, The Journal of the Acoustical Society of America.

[8]  J.H.L. Hansen,et al.  Dual-channel iterative speech enhancement with constraints on an auditory-based spectrum , 1995, IEEE Trans. Speech Audio Process..

[9]  DeLiang Wang,et al.  Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.

[10]  DeLiang Wang,et al.  Exploring Monaural Features for Classification-Based Speech Segregation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  John H. L. Hansen,et al.  A new mask-based objective measure for predicting the intelligibility of binary masked speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[13]  Tim Brookes,et al.  Ideal Binary Mask Ratio: A Novel Metric for Assessing Binary-Mask-Based Sound Source Separation Algorithms , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Jesper Jensen,et al.  An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Philipos C. Loizou,et al.  Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  James M Kates,et al.  Coherence and the speech intelligibility index. , 2004, The Journal of the Acoustical Society of America.

[17]  Jont B. Allen,et al.  Multiband product rule and consonant identification. , 2009, The Journal of the Acoustical Society of America.

[18]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[19]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[20]  Yang Lu,et al.  An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.

[21]  DeLiang Wang,et al.  An SVM based classification approach to speech separation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  L. Carney,et al.  A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. , 2001, The Journal of the Acoustical Society of America.

[23]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[24]  John H. L. Hansen,et al.  A Novel Mask Estimation Method Employing Posterior-Based Representative Mean Estimate for Missing-Feature Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  DeLiang Wang,et al.  Pitch-based monaural segregation of reverberant speech. , 2006, The Journal of the Acoustical Society of America.

[26]  Richard M. Stern,et al.  A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..

[27]  Jont B. Allen,et al.  Manipulation of Consonants in Natural Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[29]  J H Hansen,et al.  Robust estimation of speech in noisy backgrounds based on aspects of the auditory process. , 1995, The Journal of the Acoustical Society of America.

[30]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[31]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.