Identification of frequency-shifted vowels.

Within certain limits, speech intelligibility is preserved with upward or downward scaling of the spectral envelope. To study these limits and assess their interaction with fundamental frequency (F0), vowels in /hVd/ syllables were processed using the STRAIGHT vocoder and presented to listeners for identification. Identification accuracy showed a gradual decline when the spectral envelope was scaled up or down in vowels spoken by men, women, and children. Upward spectral envelope shifts led to poorer identification of children's vowels compared to adults, while downward shifts had a greater impact on men's vowels compared to women and children. Coordinated shifts (F0 and spectral envelope shifted in the same direction) generally produced higher accuracy than conditions with F0 and spectral envelope shifted in opposite directions. Vowel identification was poorest in conditions with very high F0, consistent with suggestions from the literature that sparse sampling of the spectral envelope may be a factor in vowel identification. However, the gradual decline in accuracy as a function of both upward and downward spectral envelope shifts and the interaction between spectral envelope shifts and F0 suggests the additional operation of perceptual mechanisms sensitive to the statistical covariation of F0 and formant frequencies in natural speech.

[1]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[2]  S. Soli,et al.  Development of the Hearing in Noise Test for the measurement of speech reception thresholds in quiet and in noise. , 1994, The Journal of the Acoustical Society of America.

[3]  R. P. Fahey,et al.  On explaining certain male-female differences in the phonetic realization of vowel categories , 1996 .

[4]  G. Studebaker A "rationalized" arcsine transform. , 1985, Journal of speech and hearing research.

[5]  T. M. Nearey Static, dynamic, and relational properties in vowel perception. , 1989, The Journal of the Acoustical Society of America.

[6]  Hideki Kawahara,et al.  Missing-data model of vowel identification. , 1999, The Journal of the Acoustical Society of America.

[7]  Peter F. Assmann,et al.  Identification of children's and adults' vowels: intrinsic fundamental frequency, fundamental frequency dynamics, and presence of voicing , 2001, J. Phonetics.

[8]  H. Traunmüller Conventional, Biological and Environmental Factors in Speech Communication: A Modulation Theory , 1994, Phonetica.

[9]  H. S. Gopal,et al.  A perceptual model of vowel recognition based on the auditory representation of American English vowels. , 1986, The Journal of the Acoustical Society of America.

[10]  Raymond D. Kent,et al.  Vowel acoustic space development in children: a synthesis of acoustic and anatomic data. , 2007, Journal of speech, language, and hearing research : JSLHR.

[11]  A. Slawson Vowel quality and musical timbre as functions of spectrum envelope and fundamental frequency. , 1968, The Journal of the Acoustical Society of America.

[12]  Ilse Lehiste,et al.  Vowel and Speaker Identification in Natural and Synthetic Speech , 1973, Language and Speech.

[13]  Matthias J. Sjerps,et al.  Speaker Normalization in Speech Perception , 2008, The Handbook of Speech Perception.

[14]  Peter F Assmann,et al.  Relationship between fundamental and formant frequencies in voice preference. , 2007, The Journal of the Acoustical Society of America.

[15]  James D. Miller Auditory‐perceptual interpretation of the vowel , 1987 .

[16]  Michael A. Gottfried,et al.  Three approaches to the classification of American English diphthongs , 1993 .

[17]  Richard E. Turner,et al.  The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.

[18]  H. Traunmüller Perceptual dimension of openness in vowels. , 1981, The Journal of the Acoustical Society of America.

[19]  C J Darwin,et al.  Formant-frequency matching between sounds with different bandwidths and on different fundamental frequencies. , 2001, The Journal of the Acoustical Society of America.

[20]  T. M. Nearey,et al.  Effects of consonant environment on vowel formant patterns. , 1997, The Journal of the Acoustical Society of America.

[21]  Peter F Assmann,et al.  Synthesis fidelity and time-varying spectral change in vowels. , 2005, The Journal of the Acoustical Society of America.

[22]  R. Miller Auditory Tests with Synthetic Vowels , 1951 .

[23]  J. Galvin,et al.  Effect of Training Rate on Recognition of Spectrally Shifted Speech , 2007, Ear and hearing.

[24]  R. Shannon,et al.  Recognition of spectrally degraded and frequency-shifted vowels in acoustic and electric hearing. , 1999, The Journal of the Acoustical Society of America.

[25]  Terrance M. Nearey,et al.  Modeling the role of inherent spectral change in vowel identification , 1986 .

[26]  Keith Johnson,et al.  The role of perceived speaker identity in F0 normalization of vowels. , 1990, The Journal of the Acoustical Society of America.

[27]  Roy D. Patterson,et al.  Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform , 2002, Speech Commun..

[28]  S. Zahorian,et al.  Spectral-shape features versus formants as acoustic correlates for vowels. , 1993, The Journal of the Acoustical Society of America.

[29]  P F Assmann,et al.  Time-varying spectral change in the vowels of children and adults. , 2000, The Journal of the Acoustical Society of America.

[30]  A. Faulkner,et al.  Adaptation by normal listeners to upward spectral shifts of speech: implications for cochlear implants. , 1999, The Journal of the Acoustical Society of America.

[31]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[32]  P. Lieberman,et al.  Fundamental frequency and vowel perception , 1981 .