Discrimination and recognition of scaled word sounds

Smith et al. [2] and Ives et al. [3] demonstrated that humans could extract information about the size of a speaker's vocal tract from speech sounds (vowels and syllables, respectively). We have extended their discrimination and recognition experiments to naturally pronounced words. The Just Noticeable Difference (JND) for size discrimination was between 5.5% and 19% depending on the listener. The smallest JND is comparable to that of the syllable experiments; the average JND is comparable to that of the vowel experiments. The word recognition scores remain above 50% for speaker sizes beyond the normal range for humans. The fact that good performance extends over such a large range of acoustic scales supports Irino and Patterson’s hypothesis [1] that the auditory system segregates size and shape information at an early stage in the processing.

[1]  Richard E. Turner,et al.  The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.

[2]  Hideki Kawahara,et al.  STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .

[3]  W. Fitch,et al.  Morphology and development of the human vocal tract: a study using magnetic resonance imaging. , 1999, The Journal of the Acoustical Society of America.

[4]  Roy D. Patterson,et al.  Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform , 2002, Speech Commun..

[5]  Roy D Patterson,et al.  Perception of acoustic scale and size in musical instrument sounds. , 2006, The Journal of the Acoustical Society of America.

[6]  Diane Kewley-Port,et al.  STRAIGHT: A new speech synthesizer for vowel formant discrimination , 2004 .

[7]  Hideki Kawahara,et al.  Speech intelligibility derived from time-frequency and source smearing , 2005, INTERSPEECH.

[8]  Shuichi Sakamoto,et al.  Complementary relationship between familiarity and SNR in word intelligibility test , 2004 .

[9]  F A Wichmann,et al.  Ning for Helpful Comments and Suggestions. This Paper Benefited Con- Siderably from Conscientious Peer Review, and We Thank Our Reviewers the Psychometric Function: I. Fitting, Sampling, and Goodness of Fit , 2001 .

[10]  Roy D Patterson,et al.  Discrimination of speaker size from syllable phrases. , 2005, Journal of the Acoustical Society of America.

[11]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..