Spatial cues alone produce inaccurate sound segregation: the effect of interaural time differences.

To clarify the role of spatial cues in sound segregation, this study explored whether interaural time differences (ITDs) are sufficient to allow listeners to identify a novel sound source from a mixture of sources. Listeners heard mixtures of two synthetic sounds, a target and distractor, each of which possessed naturalistic spectrotemporal correlations but otherwise lacked strong grouping cues, and which contained either the same or different ITDs. When the task was to judge whether a probe sound matched a source in the preceding mixture, performance improved greatly when the same target was presented repeatedly across distinct distractors, consistent with previous results. In contrast, performance improved only slightly with ITD separation of target and distractor, even when spectrotemporal overlap between target and distractor was reduced. However, when subjects localized, rather than identified, the sources in the mixture, sources with different ITDs were reported as two sources at distinct and accurately identified locations. ITDs alone thus enable listeners to perceptually segregate mixtures of sources, but the perceived content of these sources is inaccurate when other segregation cues, such as harmonicity and common onsets and offsets, do not also promote proper source separation.

[1]  Ruth Y Litovsky,et al.  The role of head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources. , 2004, The Journal of the Acoustical Society of America.

[2]  R. Carlyon How the brain separates sounds , 2004, Trends in Cognitive Sciences.

[3]  S McAdams,et al.  Identification of concurrent harmonic and inharmonic vowels: a test of the theory of harmonic cancellation and enhancement. , 1995, The Journal of the Acoustical Society of America.

[4]  J. Culling,et al.  Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. , 1995, The Journal of the Acoustical Society of America.

[5]  R. W. Hukin,et al.  Perceptual segregation of a harmonic from a vowel by interaural time difference and frequency proximity. , 1997, The Journal of the Acoustical Society of America.

[6]  J. Nuetzel,et al.  Lateralization of complex waveforms: effects of fine structure, amplitude, and duration. , 1976, The Journal of the Acoustical Society of America.

[7]  Frederick J. Gallun,et al.  Binaural release from informational masking in a speech identification task. , 2005, The Journal of the Acoustical Society of America.

[8]  DeLiang Wang,et al.  Binaural tracking of multiple moving sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  B. Moore,et al.  Thresholds for hearing mistuned partials as separate tones in harmonic complexes. , 1986, The Journal of the Acoustical Society of America.

[10]  Ruth Y Litovsky,et al.  The benefit of binaural hearing in a cocktail party: effect of location and type of interferer. , 2004, The Journal of the Acoustical Society of America.

[11]  Josh H. McDermott The cocktail party problem , 2009, Current Biology.

[12]  Nathaniel I Durlach,et al.  Application of an extended equalization-cancellation model to speech intelligibility with spatially distributed maskers. , 2010, The Journal of the Acoustical Society of America.

[13]  C. Darwin,et al.  The Quarterly Journal of Experimental Psychology Section a Human Experimental Psychology Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time Perceptual Grouping of Speech Components Differing in Fundamental Frequency and Onset-time , 2022 .

[14]  B C Moore,et al.  Comodulation masking release (CMR): effects of signal frequency, flanking-band frequency, masker bandwidth, flanking-band level, and monotic versus dichotic presentation of the flanking band. , 1987, The Journal of the Acoustical Society of America.

[15]  Barbara G Shinn-Cunningham,et al.  Localizing nearby sound sources in a classroom: binaural room impulse responses. , 2005, The Journal of the Acoustical Society of America.

[16]  Barbara G Shinn-Cunningham,et al.  Dissociation of perceptual judgments of "what" and "where" in an ambiguous auditory scene. , 2010, The Journal of the Acoustical Society of America.

[17]  Adrian K C Lee,et al.  Localization interference between components in an auditory scene. , 2009, The Journal of the Acoustical Society of America.

[18]  A. Bronkhorst,et al.  Multichannel speech intelligibility and talker recognition using monaural, binaural, and three-dimensional auditory presentation. , 2000, The Journal of the Acoustical Society of America.

[19]  N. Durlach Equalization and Cancellation Theory of Binaural Masking‐Level Differences , 1963 .

[20]  B. Shinn-Cunningham Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.

[21]  H S Colburn,et al.  Reducing informational masking by sound segregation. , 1994, The Journal of the Acoustical Society of America.

[22]  J M Brunstrom,et al.  Perceptual segregation and pitch shifts of mistuned components in harmonic complexes and in regular inharmonic complexes. , 1998, The Journal of the Acoustical Society of America.

[23]  J. C. Middlebrooks,et al.  Listener weighting of cues for lateral angle: the duplex theory of sound localization revisited. , 2002, The Journal of the Acoustical Society of America.

[24]  G. Kidd,et al.  The effect of spatial separation on informational and energetic masking of speech. , 2002, The Journal of the Acoustical Society of America.

[25]  F. Wightman,et al.  The dominant role of low-frequency interaural time differences in sound localization. , 1992, The Journal of the Acoustical Society of America.

[26]  Frederick J. Gallun,et al.  The advantage of knowing where to listen. , 2005, The Journal of the Acoustical Society of America.

[27]  Virginia Best,et al.  The influence of spatial separation on divided listening. , 2006, The Journal of the Acoustical Society of America.

[28]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[29]  R L Freyman,et al.  Spatial release from informational masking in speech recognition. , 2001, The Journal of the Acoustical Society of America.

[30]  C. Darwin Auditory grouping , 1997, Trends in Cognitive Sciences.

[31]  John F Culling,et al.  Evidence specifically favoring the equalization-cancellation theory of binaural unmasking. , 2007, The Journal of the Acoustical Society of America.

[32]  Michael A Akeroyd The across frequency independence of equalization of interaural time delay in the equalization-cancellation model of binaural unmasking. , 2004, The Journal of the Acoustical Society of America.

[33]  E. C. Cmm,et al.  on the Recognition of Speech, with , 2008 .

[34]  S. Shamma,et al.  Behind the scenes of auditory perception , 2010, Current Opinion in Neurobiology.

[35]  Virginia Best,et al.  Binaural interference and auditory grouping. , 2007, The Journal of the Acoustical Society of America.

[36]  Barbara G. Shinn-Cunningham,et al.  Bottom-up and top-down influences on spatial unmasking , 2005 .

[37]  Yoshitaka Nakajima,et al.  Auditory Scene Analysis: The Perceptual Organization of Sound Albert S. Bregman , 1992 .

[38]  André van Schaik,et al.  Auditory spatial perception with sources overlapping in frequency and time , 2005 .

[39]  B. Shinn-Cunningham,et al.  Influences of spatial cues on grouping and understanding sound , 2005 .

[40]  Barbara G Shinn-Cunningham,et al.  A sound element gets lost in perceptual competition , 2007, Proceedings of the National Academy of Sciences.

[41]  James L. Flanagan,et al.  Digital coding of speech in sub-bands , 1976, The Bell System Technical Journal.

[42]  Klaus Hartung,et al.  Localization in the Presence of a Distracter and Reverberation in the Frontal Horizontal Plane. I. Psychoacoustical Data , 2002 .

[43]  R. W. Hukin,et al.  Auditory objects of attention: the role of interaural time differences. , 1999, Journal of experimental psychology. Human perception and performance.

[44]  J E Cutting,et al.  Aspects of phonological fusion. , 1975, Journal of experimental psychology. Human perception and performance.

[45]  Virginia Best,et al.  Visually-guided Attention Enhances Target Identification in a Complex Auditory Scene , 2007, Journal for the Association for Research in Otolaryngology.

[46]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[47]  G. Henning Detectability of interaural delay in high-frequency complex waveforms. , 1974, The Journal of the Acoustical Society of America.

[48]  Daniel P. W. Ellis,et al.  Evaluating Source Separation Algorithms With Reverberant Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[49]  T N Buell,et al.  Combination of binaural information across frequency bands. , 1991, The Journal of the Acoustical Society of America.

[50]  DeLiang Wang,et al.  Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.

[51]  Ruth Y. Litovsky,et al.  Erratum: The role head-induced interaural time and level differences in the speech reception threshold for multiple interfering sound sources [J. Acoust. Soc. Am. 116, 1057 (2004)] , 2005 .

[52]  Neil A. Macmillan,et al.  Detection Theory: A User's Guide , 1991 .

[53]  Daniel P. W. Ellis,et al.  Model-Based Scene Analysis , 2005 .

[54]  C. Mason,et al.  Release from masking due to spatial separation of sources in the identification of nonspeech auditory patterns. , 1998, The Journal of the Acoustical Society of America.

[55]  D. McFadden,et al.  Lateralization of high frequencies based on interaural time differences. , 1976, The Journal of the Acoustical Society of America.

[56]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[57]  Barbara Shinn-Cunningham,et al.  Spatial release from energetic and informational masking in a selective speech identification task. , 2008, The Journal of the Acoustical Society of America.

[58]  R. Dye,et al.  The combination of interaural information across frequencies: lateralization on the basis of interaural delay. , 1990, The Journal of the Acoustical Society of America.

[59]  Virginia Best,et al.  Stimulus factors influencing spatial release from speech-on-speech masking. , 2010, The Journal of the Acoustical Society of America.

[60]  M. F. Cohen,et al.  The effect of cross-spectrum correlation on the detectability of a noise band. , 1987, The Journal of the Acoustical Society of America.

[61]  John F Culling,et al.  The spatial unmasking of speech: evidence for within-channel processing of interaural time delay. , 2005, The Journal of the Acoustical Society of America.

[62]  Joseph W. Hall,et al.  Detection in noise by spectro-temporal pattern analysis. , 1984, The Journal of the Acoustical Society of America.

[63]  C Trahiotis,et al.  Detection of interaural delay in high-frequency sinusoidally amplitude-modulated tones, two-tone complexes, and bands of noise. , 1994, The Journal of the Acoustical Society of America.

[64]  R. Dye,et al.  The combination of interaural information across frequencies: the effects of number and spacing of components, onset asynchrony, and harmonicity. , 1993, The Journal of the Acoustical Society of America.

[65]  Josh H McDermott,et al.  Recovering sound sources from embedded repetition , 2011, Proceedings of the National Academy of Sciences.

[66]  Stuart Gatehouse,et al.  Perceptual segregation of competing speech sounds: the role of spatial location. , 1999, The Journal of the Acoustical Society of America.