Discrimination of speaker size from syllable phrases.

The length of the vocal tract is correlated with speaker size and, so, speech sounds have information about the size of the speaker in a form that is interpretable by the listener. A wide range of different vocal tract lengths exist in the population and humans are able to distinguish speaker size from the speech. Smith et al. [J. Acoust. Soc. Am. 117, 305-318 (2005)] presented vowel sounds to listeners and showed that the ability to discriminate speaker size extends beyond the normal range of speaker sizes which suggests that information about the size and shape of the vocal tract is segregated automatically at an early stage in the processing. This paper reports an extension of the size discrimination research using a much larger set of speech sounds, namely, 180 consonant-vowel and vowel-consonant syllables. Despite the pronounced increase in stimulus variability, there was actually an improvement in discrimination performance over that supported by vowel sounds alone. Performance with vowel-consonant syllables was slightly better than with consonant-vowel syllables. These results support the hypothesis that information about the length of the vocal tract is segregated at an early stage in auditory processing.

[1]  Roy D. Patterson,et al.  Vowel normalisation: Time-domain processing of the internal dynamics of speech , 2006 .

[2]  M. Owren,et al.  Voices of athletes reveal only modest acoustic correlates of stature , 2005 .

[3]  Drew Rendall,et al.  Reliable but weak voice‐formant cues to body size in men but not women , 2005 .

[4]  Peter F Assmann,et al.  Synthesis fidelity and time-varying spectral change in vowels. , 2005, The Journal of the Acoustical Society of America.

[5]  D. Rendall,et al.  Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic allometry. , 2005, The Journal of the Acoustical Society of America.

[6]  Richard E. Turner,et al.  The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.

[7]  Julio González,et al.  Formant frequencies and body size of speaker: a weak relationship in adult humans , 2004, J. Phonetics.

[8]  Diane Kewley-Port,et al.  STRAIGHT: A new speech synthesizer for vowel formant discrimination , 2004 .

[9]  Terrance M. Nearey,et al.  Modeling the perception of frequency-shifted vowels , 2002, INTERSPEECH.

[10]  Roy D. Patterson,et al.  Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform , 2002, Speech Commun..

[11]  W. Fitch,et al.  Perception of Vocal Tract Resonances by Whooping Cranes Grus americana , 2000 .

[12]  T. Riede,et al.  Vocal tract length and acoustics of vocalization in the domestic dog (Canis familiaris). , 1999, The Journal of experimental biology.

[13]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[14]  W. Fitch Acoustic exaggeration of size in birds via tracheal elongation: comparative and theoretical analyses , 1999 .

[15]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[16]  W. Fitch,et al.  Modeling the role of nonhuman vocal membranes in phonation. , 1999, The Journal of the Acoustical Society of America.

[17]  W. Fitch Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. , 1997, The Journal of the Acoustical Society of America.

[18]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[19]  Leon Cohen,et al.  The scale representation , 1993, IEEE Trans. Signal Process..

[20]  Hast Mh,et al.  The larynx of roaring and non-roaring cats. , 1989 .

[21]  I. Titze Physiologic and acoustic differences between male and female voices. , 1989, The Journal of the Acoustical Society of America.

[22]  H J Künzel,et al.  How Well Does Average Fundamental Frequency Correlate with Speaker Height and Weight? , 1989, Phonetica.

[23]  S. M. Marcus Acoustic determinants of perceptual center (P-center) location , 1981, Perception & psychophysics.

[24]  S. Greenberg,et al.  Dynamics of speech production and perception , 2006 .

[25]  Hideki Kawahara,et al.  Underlying Principles of a High-quality Speech Manipulation System STRAIGHT and Its Application to Speech Segregation , 2005, Speech Separation by Humans and Machines.

[26]  Terrance M. Nearey,et al.  Frequency Shifts and Vowel Identification , 2003 .

[27]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[28]  P. Moore Photosynthesis: Mixed metabolism in plant pools , 1999, Nature.

[29]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[30]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[31]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .