Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise.

When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.

[1]  Michelle R. Molis,et al.  Speech recognition in fluctuating and continuous maskers: effects of hearing loss and presentation level. , 2004, Journal of speech, language, and hearing research : JSLHR.

[2]  Philipos C Loizou,et al.  Factors influencing glimpsing of speech in noise. , 2007, The Journal of the Acoustical Society of America.

[3]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[4]  R. M. Warren Perceptual Restoration of Missing Speech Sounds , 1970, Science.

[5]  H S Colburn,et al.  Reducing informational masking by sound segregation. , 1994, The Journal of the Acoustical Society of America.

[6]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[7]  Barbara G. Shinn-Cunningham,et al.  Bottom-up and top-down influences on spatial unmasking , 2005 .

[8]  DeLiang Wang,et al.  Speech perception of noise with binary gains. , 2008, The Journal of the Acoustical Society of America.

[9]  K S Helfer,et al.  Auditory and auditory-visual perception of clear and conversational speech. , 1997, Journal of speech, language, and hearing research : JSLHR.

[10]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[11]  P. Loizou,et al.  Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.

[12]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[13]  DeLiang Wang,et al.  A schema-based model for phonemic restoration , 2005, Speech Commun..

[14]  Liang Li,et al.  Does the information content of an irrelevant source differentially affect spoken word recognition in younger and older adults? , 2004, Journal of experimental psychology. Human perception and performance.

[15]  DeLiang Wang,et al.  Multitalker speech perception with ideal time-frequency segregation: effects of voice characteristics and number of talkers. , 2009, The Journal of the Acoustical Society of America.

[16]  R L Freyman,et al.  The role of perceived spatial separation in the unmasking of speech. , 1999, The Journal of the Acoustical Society of America.

[17]  B. Shinn-Cunningham,et al.  Informational masking: counteracting the effects of stimulus uncertainty by decreasing target-masker similarity. , 2003, The Journal of the Acoustical Society of America.

[18]  Keiichi Tokuda,et al.  Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[19]  J A Bashford,et al.  Increasing the intelligibility of speech through multiple phonemic restorations. , 1990, Perception & psychophysics.

[20]  Deliang Wang,et al.  Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.

[21]  Liang Li,et al.  The effect of perceived spatial separation on informational masking of Chinese speech , 2005, Hearing Research.

[22]  J. Oghalai,et al.  Chapter 37 – Cochlear Hearing Loss , 2005 .

[23]  D. Wang,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.

[24]  Guy J. Brown,et al.  Computational auditory scene analysis , 1994, Comput. Speech Lang..

[25]  Simon King,et al.  The Blizzard Challenge 2009 , 2009 .

[26]  Barbara G Shinn-Cunningham,et al.  Influences of auditory object formation on phonemic restoration. , 2008, The Journal of the Acoustical Society of America.

[27]  G. Kidd,et al.  The effect of spatial separation on informational and energetic masking of speech. , 2002, The Journal of the Acoustical Society of America.

[28]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[29]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[30]  Lawrence S. Kroll Mathematica--A System for Doing Mathematics by Computer. , 1989 .

[31]  Deniz Başkent,et al.  Effects of envelope discontinuities on perceptual restoration of amplitude-compressed speech. , 2009, The Journal of the Acoustical Society of America.

[32]  DeLiang Wang,et al.  Speech intelligibility in background noise with ideal binary time-frequency masking. , 2009, The Journal of the Acoustical Society of America.

[33]  G. L. Powers,et al.  Intelligibility of temporally interrupted speech with and without intervening noise. , 1973, The Journal of the Acoustical Society of America.

[34]  Keiichi Tokuda,et al.  An adaptive algorithm for mel-cepstral analysis of speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[35]  Heiga Zen,et al.  Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005 , 2007, IEICE Trans. Inf. Syst..