Switching Streams Across Ears to Evaluate Informational Masking of Speech-on-Speech

Supplemental Digital Content is available in the text. Objectives: This study aimed to evaluate the informational component of speech-on-speech masking. Speech perception in the presence of a competing talker involves not only informational masking (IM) but also a number of masking processes involving interaction of masker and target energy in the auditory periphery. Such peripherally generated masking can be eliminated by presenting the target and masker in opposite ears (dichotically). However, this also reduces IM by providing listeners with lateralization cues that support spatial release from masking (SRM). In tonal sequences, IM can be isolated by rapidly switching the lateralization of dichotic target and masker streams across the ears, presumably producing ambiguous spatial percepts that interfere with SRM. However, it is not clear whether this technique works with speech materials. Design: Speech reception thresholds (SRTs) were measured in 17 young normal-hearing adults for sentences produced by a female talker in the presence of a competing male talker under three different conditions: diotic (target and masker in both ears), dichotic, and dichotic but switching the target and masker streams across the ears. Because switching rate and signal coherence were expected to influence the amount of IM observed, these two factors varied across conditions. When switches occurred, they were either at word boundaries or periodically (every 116 msec) and either with or without a brief gap (84 msec) at every switch point. In addition, SRTs were measured in a quiet condition to rule out audibility as a limiting factor. Results: SRTs were poorer for the four switching dichotic conditions than for the nonswitching dichotic condition, but better than for the diotic condition. Periodic switches without gaps resulted in the worst SRTs compared to the other switch conditions, thus maximizing IM. Conclusions: These findings suggest that periodically switching the target and masker streams across the ears (without gaps) was the most efficient in disrupting SRM. Thus, this approach can be used in experiments that seek a relatively pure measure of IM, and could be readily extended to translational research.

[1]  Birger Kollmeier,et al.  Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features. , 2016, The Journal of the Acoustical Society of America.

[2]  Stuart Rosen,et al.  The role of periodicity in perceiving speech in quiet and in background noise. , 2015, The Journal of the Acoustical Society of America.

[3]  Josh H. McDermott,et al.  Attentive Tracking of Sound Sources , 2015, Current Biology.

[4]  M. Sommers,et al.  Age-related differences in inhibitory control predict audiovisual speech perception. , 2015, Psychology and aging.

[5]  Cécile Colin,et al.  Isolating Informational Masking in Both Pure and Complex Tone Sequences , 2015, Ear and hearing.

[6]  J. Swaminathan,et al.  Does providing more processing time improve speech intelligibility in hearing-impaired listeners? , 2015 .

[7]  B. Moore,et al.  Age-group differences in speech identification despite matched audiometrically normal hearing: contributions from auditory temporal processing and cognition , 2015, Front. Aging Neurosci..

[8]  Stuart Rosen,et al.  The role of auditory and cognitive factors in understanding speech in noise by normal-hearing older listeners , 2014, Front. Aging Neurosci..

[9]  B. Moore,et al.  On the near non-existence of "pure" energetic masking release for speech. , 2014, The Journal of the Acoustical Society of America.

[10]  Brian Gygi,et al.  Spatial and temporal modifications of multitalker speech can improve speech perception in older adults , 2014, Hearing Research.

[11]  G. Kidd,et al.  The role of syntax in maintaining the integrity of streams of speech. , 2013, The Journal of the Acoustical Society of America.

[12]  Virginia Best,et al.  Perceiving sequential dependencies in auditory streams. , 2013, The Journal of the Acoustical Society of America.

[13]  Torsten Dau,et al.  A multi-resolution envelope-power based model for speech intelligibility. , 2013, The Journal of the Acoustical Society of America.

[14]  Stuart Rosen,et al.  Listening to speech in a background of other talkers: effects of talker number and noise vocoding. , 2013, The Journal of the Acoustical Society of America.

[15]  John F Culling,et al.  Speech intelligibility among modulated and spatially distributed noise sources. , 2013, The Journal of the Acoustical Society of America.

[16]  B. Moore,et al.  Notionally steady background noise acts primarily as a modulation masker of speech. , 2012, The Journal of the Acoustical Society of America.

[17]  Barbara G Shinn-Cunningham,et al.  Normal hearing is not enough to guarantee robust encoding of suprathreshold features important in everyday communication , 2011, Proceedings of the National Academy of Sciences.

[18]  Michael Vorländer,et al.  Switching in the cocktail party: exploring intentional control of auditory selective attention. , 2011, Journal of experimental psychology. Human perception and performance.

[19]  S. Shamma,et al.  Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.

[20]  Michael A Akeroyd,et al.  Informational masking in young and elderly listeners for speech masked by simultaneous speech and noise. , 2009, The Journal of the Acoustical Society of America.

[21]  B. Shinn-Cunningham Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.

[22]  S. Mathey,et al.  Aging and lexical inhibition: the effect of orthographic neighborhood frequency in young and older adults. , 2007, The journals of gerontology. Series B, Psychological sciences and social sciences.

[23]  W. Ritter,et al.  Development of auditory selective attention: event-related potential measures of channel selection and target detection. , 2007, Psychophysiology.

[24]  I. Winkler,et al.  The development of the perceptual organization of sound by frequency separation in 5–11-year-old children , 2007, Hearing Research.

[25]  Virginia Best,et al.  Binaural interference and auditory grouping. , 2007, The Journal of the Acoustical Society of America.

[26]  M. Cooke,et al.  Consonant identification in N-talker babble is a nonmonotonic function of N. , 2005, The Journal of the Acoustical Society of America.

[27]  Doris J Kistler,et al.  Informational masking of speech in children: effects of ipsilateral and contralateral distracters. , 2005, The Journal of the Acoustical Society of America.

[28]  B. Shinn-Cunningham,et al.  Informational masking: counteracting the effects of stimulus uncertainty by decreasing target-masker similarity. , 2003, The Journal of the Acoustical Society of America.

[29]  G. Kidd,et al.  The effect of spatial separation on informational and energetic masking of speech. , 2002, The Journal of the Acoustical Society of America.

[30]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[31]  B J Kwon,et al.  Consonant identification under maskers with sinusoidal modulation: masking release or modulation interference? , 2001, The Journal of the Acoustical Society of America.

[32]  W. T. Nelson,et al.  A speech corpus for multitalker communications research. , 2000, The Journal of the Acoustical Society of America.

[33]  R L Freyman,et al.  The role of perceived spatial separation in the unmasking of speech. , 1999, The Journal of the Acoustical Society of America.

[34]  Jos J. Eggermont,et al.  Comparison of Distortion Product Otoacoustic Emission (DPOAE) and Auditory Brain Stem Response (ABR) Traveling Wave Delay Measurements Suggests Frequency‐Specific Synapse Maturation , 1996, Ear and hearing.

[35]  H S Colburn,et al.  Reducing informational masking by sound segregation. , 1994, The Journal of the Acoustical Society of America.

[36]  W Jesteadt,et al.  Informational masking for multicomponent maskers with spectral gaps. , 1993, The Journal of the Acoustical Society of America.

[37]  S. Rosen Temporal information in speech: acoustic, auditory and linguistic aspects. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[38]  D. M. Green,et al.  Masking produced by spectral uncertainty with multicomponent maskers , 1987, Perception & psychophysics.

[39]  Frederic L. Wightman,et al.  Detectability of varying interaural temporal differencesa) , 1978 .

[40]  Irwin Pollack,et al.  Auditory informational masking , 1975 .

[41]  H. Akaike A new look at the statistical model identification , 1974 .

[42]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[43]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[44]  M. Ericson,et al.  Informational and energetic masking effects in the perception of multiple simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[45]  A. M. Mimpen,et al.  Improving the reliability of testing the speech reception threshold for sentences. , 1979, Audiology : official organ of the International Society of Audiology.

[46]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.