Factors influencing glimpsing of speech in noise.

The idea that listeners are able to "glimpse" the target speech in the presence of competing noise has been supported by many studies, and is based on the assumption that listeners are able to glimpse pieces of the target speech occurring at different times and somehow patch them together to hear out the target speech. The factors influencing glimpsing in noise are not well understood and are examined in the present study. Specifically, the effects of the frequency location, spectral width, and duration of the glimpses are examined. Stimuli were constructed using an ideal time-frequency (T-F) masking technique that ensures that the target is stronger than the masker in certain T-F regions of the mixture, thereby rendering certain regions easier to glimpse than others. Sentences were synthesized using this technique with glimpse information placed in several frequency regions while varying the glimpse window duration and total duration of glimpsing. Results indicated that the frequency location and total duration of the glimpses had a significant effect on speech recognition, with the highest performance obtained when the listeners were able to glimpse information in the F1F2 frequency region (0-3 kHz) for at least 60% of the utterance.

[1]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[2]  S. Rosen,et al.  Uncomodulated glimpsing in "checkerboard" noise. , 1993, The Journal of the Acoustical Society of America.

[3]  Emily Buss,et al.  Spectral integration of synchronous and asynchronous cues to consonant identification. , 2004, The Journal of the Acoustical Society of America.

[4]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[5]  DeLiang Wang,et al.  Pitch-based monaural segregation of reverberant speech. , 2006, The Journal of the Acoustical Society of America.

[6]  DeLiang Wang,et al.  On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.

[7]  Martin Cooke,et al.  Glimpsing speech , 2003, J. Phonetics.

[8]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[9]  CookeMartin,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001 .

[10]  J. Culling,et al.  Perceptual and computational separation of simultaneous vowels: cues arising from low-frequency beating. , 1994, The Journal of the Acoustical Society of America.

[11]  G. A. Miller,et al.  The Intelligibility of Interrupted Speech , 1948 .

[12]  Phil D. Green,et al.  Handling missing data in speech recognition , 1994, ICSLP.

[13]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[14]  G. A. Miller,et al.  The masking of speech. , 1947, Psychological bulletin.

[15]  R. Plomp,et al.  Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.

[16]  Lauren Calandruccio,et al.  Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.

[17]  Martin Cooke,et al.  Making Sense of Everyday Speech: a Glimpsing Account , 2005, Speech Separation by Humans and Machines.

[18]  R Drullman,et al.  Speech intelligibility in noise: relative contribution of speech elements above and below the noise level. , 1995, The Journal of the Acoustical Society of America.