The interaction of vocal characteristics and audibility in the recognition of concurrent syllables.

In concurrent-speech recognition, performance is enhanced when either the glottal pulse rate (GPR) or the vocal tract length (VTL) of the target speaker differs from that of the distracter, but relatively little is known about the trading relationship between the two variables, or how they interact with other cues such as signal-to-noise ratio (SNR). This paper presents a study in which listeners were asked to identify a target syllable in the presence of a distracter syllable, with carefully matched temporal envelopes. The syllables varied in GPR and VTL over a large range, and they were presented at different SNRs. The results showed that performance is particularly sensitive to the combination of GPR and VTL when the SNR is 0 dB. Equal-performance contours showed that when there are no other cues, a two-semitone difference in GPR produced the same advantage in performance as a 20% difference in VTL. This corresponds to a trading relationship between GPR and VTL of 1.6. The results illustrate that the auditory system can use any combination of differences in GPR, VTL, and SNR to segregate competing speech signals.

[1]  Antje Ihlefeld,et al.  Spatial release from energetic and informational masking in a divided speech identification task. , 2008, The Journal of the Acoustical Society of America.

[2]  Hideki Kawahara,et al.  Underlying Principles of a High-quality Speech Manipulation System STRAIGHT and Its Application to Speech Segregation , 2005, Speech Separation by Humans and Machines.

[3]  M. Ericson,et al.  Informational and energetic masking effects in the perception of multiple simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[4]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with different fundamental frequencies. , 1990, The Journal of the Acoustical Society of America.

[5]  Roy D. Patterson,et al.  Comparison of relative and absolute judgments of speaker size based on vowel sounds , 2007 .

[6]  Martin Cooke,et al.  A glimpsing model of speech perception in noise. , 2006, The Journal of the Acoustical Society of America.

[7]  Léonore Bourgeon,et al.  Unattended speech processing: effect of vocal-tract length. , 2007, The Journal of the Acoustical Society of America.

[8]  I. Titze Physiologic and acoustic differences between male and female voices. , 1989, The Journal of the Acoustical Society of America.

[9]  Roy D. Patterson,et al.  Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform , 2002, Speech Commun..

[10]  Q Summerfield,et al.  The contribution of waveform interactions to the perception of concurrent vowels. , 1994, The Journal of the Acoustical Society of America.

[11]  C Ludvigsen,et al.  DANTALE: a new Danish speech material. , 1989, Scandinavian audiology.

[12]  Roy D Patterson,et al.  The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. , 2005, The Journal of the Acoustical Society of America.

[13]  G. A. Miller,et al.  The Intelligibility of Interrupted Speech , 1948 .

[14]  Michael K. Qin,et al.  Effects of Envelope-Vocoder Processing on F0 Discrimination and Concurrent-Vowel Identification , 2005, Ear and hearing.

[15]  Hideki Kawahara,et al.  Concurrent vowel identification. I. Effects of relative amplitude and F0 difference , 1997, The Journal of the Acoustical Society of America.

[16]  T. M. Nearey Static, dynamic, and relational properties in vowel perception. , 1989, The Journal of the Acoustical Society of America.

[17]  Roy D. Patterson,et al.  Processing the acoustic effect of size in speech sounds , 2006 .

[18]  Roy D. Patterson,et al.  Processing the acoustic effect of size in speech sounds , 2006, NeuroImage.

[19]  C. Darwin,et al.  Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. , 2003, The Journal of the Acoustical Society of America.

[20]  Richard E. Turner,et al.  The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.

[21]  Q Summerfield,et al.  Perception of concurrent vowels: effects of harmonic misalignment and pitch-period asynchrony. , 1991, The Journal of the Acoustical Society of America.

[22]  C. M. Marin,et al.  Concurrent vowel identification II: Effects of phase, harmonicity and task , 1997 .

[23]  Alain de Cheveigné,et al.  Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancell , 1993 .

[24]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[25]  Roy D Patterson,et al.  Perception of acoustic scale and size in musical instrument sounds. , 2006, The Journal of the Acoustical Society of America.

[26]  R. J. Ritsma,et al.  Frequency Selectivity and the Tonal Residue , 1974 .

[27]  Elaine Drom,et al.  Information conveyed by vowels about other vowels , 2004 .

[28]  C. Darwin,et al.  The role of timbre in the segregation of simultaneous voices with intersecting F0 contours , 1993, Perception & psychophysics.

[29]  A S Bregman,et al.  The perceptual segregation of simultaneous vowels with harmonic, shifted, or random components , 1993, Perception & psychophysics.

[30]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[31]  S. M. Marcus Acoustic determinants of perceptual center (P-center) location , 1981, Perception & psychophysics.

[32]  Richard E. Turner,et al.  A statistical, formant-pattern model for segregating vowel type and vocal-tract length in developmental formant data. , 2009, The Journal of the Acoustical Society of America.

[33]  Roy D Patterson,et al.  Discrimination of speaker size from syllable phrases. , 2005, Journal of the Acoustical Society of America.

[34]  Thomas J. Moore Voice communications jamming research , 1981 .

[35]  S. Collins,et al.  Men's voices and women's choices , 2000, Animal Behaviour.

[36]  A. Cheveigné Concurrent vowel identification. III. A neural model of harmonic interference cancellation , 1997 .

[37]  W. T. Nelson,et al.  A speech corpus for multitalker communications research. , 2000, The Journal of the Acoustical Society of America.

[38]  Marie Rivenez,et al.  Processing unattended speech. , 2006, The Journal of the Acoustical Society of America.

[39]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[40]  Barbara Shinn-Cunningham,et al.  Spatial release from energetic and informational masking in a selective speech identification task. , 2008, The Journal of the Acoustical Society of America.

[41]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .