Location and acoustic scale cues in concurrent speech recognition.

Location and acoustic scale cues have both been shown to have an effect on the recognition of speech in multi-speaker environments. This study examines the interaction of these variables. Subjects were presented with concurrent triplets of syllables from a target voice and a distracting voice, and asked to recognize a specific target syllable. The task was made more or less difficult by changing (a) the location of the distracting speaker, (b) the scale difference between the two speakers, and/or (c) the relative level of the two speakers. Scale differences were produced by changing the vocal tract length and glottal pulse rate during syllable synthesis: 32 acoustic scale differences were used. Location cues were produced by convolving head-related transfer functions with the stimulus. The angle between the target speaker and the distracter was 0 degrees, 4 degrees, 8 degrees, 16 degrees, or 32 degrees on the 0 degrees horizontal plane. The relative level of the target to the distracter was 0 or -6 dB. The results show that location and scale difference interact, and the interaction is greatest when one of these cues is small. Increasing either the acoustic scale or the angle between target and distracter speakers quickly elevates performance to ceiling levels.

[1]  I. Hirsh The Influence of Interaural Phase on Interaural Summation and Inhibition , 1948 .

[2]  J. C. R. Licklider,et al.  The Influence of Interaural Phase Relations upon the Masking of Speech by White Noise , 1948 .

[3]  Roy D. Patterson,et al.  Neural Representation of Auditory Size in the Human Voice and in Sounds from Other Resonant Sources , 2007, Current Biology.

[4]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[5]  J. Culling,et al.  Speech segregation in rooms: effects of reverberation on both target and interferer. , 2007, The Journal of the Acoustical Society of America.

[6]  W A SHAW,et al.  The difference between monaural and binaural thresholds. , 1947, Journal of experimental psychology.

[7]  R. W. Hukin,et al.  Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. , 2000, The Journal of the Acoustical Society of America.

[8]  Roy D. Patterson,et al.  Comparison of relative and absolute judgments of speaker size based on vowel sounds , 2007 .

[9]  Dorte Hammershøi,et al.  Binaural Technique: Do We Need Individual Recordings? , 1996 .

[10]  Richard E. Turner,et al.  A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data. , 2009, The Journal of the Acoustical Society of America.

[11]  F L Wightman,et al.  Localization using nonindividualized head-related transfer functions. , 1993, The Journal of the Acoustical Society of America.

[12]  Ira J. Hirsh,et al.  The Relation between Localization and Intelligibility , 1950 .

[13]  H S Colburn,et al.  Speech intelligibility and localization in a multi-source environment. , 1999, The Journal of the Acoustical Society of America.

[14]  Roy D Patterson,et al.  Discrimination of speaker size from syllable phrases. , 2005, Journal of the Acoustical Society of America.

[15]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[16]  E. C. Cherry Some Experiments on the Recognition of Speech, with One and with Two Ears , 1953 .

[17]  S. M. Marcus Acoustic determinants of perceptual center (P-center) location , 1981, Perception & psychophysics.

[18]  Roy D Patterson,et al.  The interaction of vocal characteristics and audibility in the recognition of concurrent syllables. , 2009, The Journal of the Acoustical Society of America.

[19]  F L Wightman,et al.  Resolution of front-back ambiguity in spatial hearing by listener and source movement. , 1999, The Journal of the Acoustical Society of America.

[20]  F L Wightman,et al.  Headphone simulation of free-field listening. I: Stimulus synthesis. , 1989, The Journal of the Acoustical Society of America.

[21]  Roy D Patterson,et al.  The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. , 2005, The Journal of the Acoustical Society of America.

[22]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[23]  F L Wightman,et al.  Headphone simulation of free-field listening. II: Psychophysical validation. , 1989, The Journal of the Acoustical Society of America.

[24]  Richard E. Turner,et al.  The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.

[25]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[26]  Roy D. Patterson,et al.  Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform , 2002, Speech Commun..

[27]  N. Durlach Equalization and Cancellation Theory of Binaural Masking‐Level Differences , 1963 .