Modeling the perception of frequency-shifted vowels

A significant fact about speech perception is that intelligibility is preserved when the spectrum is shifted up or down along the frequency scale, across a fairly wide range. To study the relationship between fundamental frequency (F0) and spectrum envelope shifts in vowel perception, we used a high-quality vocoder (STRAIGHT) to process a set of vowels spoken by 3 adult males in /hVd/ context. Identification accuracy dropped by about 30% when the spectrum envelope was scaled upwards by a factor of 2.0, and in a separate condition, by about 50% when F0 was raised by 2 octaves. However, when spectrum envelope and F0 were both increased at the same time, identification accuracy showed a marked improvement, compared to conditions where each cue was manipulated separately. The synergy between formant frequency and F0 was predicted by a model which accounts for the intelligibility of frequency-shifted vowels in terms of learned relationships between measured values of F0 and formant frequencies. A second model, based on auditory excitation patterns, predicted the main effects of F0 and spectrum envelope, but did not predict the pattern of interaction.

[1]  Terrance M. Nearey Selection of a tonotopic scale for vowels , 1998 .

[2]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[3]  R. Shannon,et al.  Recognition of spectrally degraded and frequency-shifted vowels in acoustic and electric hearing. , 1999, The Journal of the Acoustical Society of America.

[4]  T. M. Nearey Static, dynamic, and relational properties in vowel perception. , 1989, The Journal of the Acoustical Society of America.

[5]  R Plomp,et al.  Objective analysis versus subjective assessment of vowels pronounced by deaf and normal-hearing children. , 1995, The Journal of the Acoustical Society of America.

[6]  P F Assmann Modeling the perception of concurrent vowels: Role of formant transitions. , 1996, The Journal of the Acoustical Society of America.

[7]  Terrance M. Nearey,et al.  Modeling the role of inherent spectral change in vowel identification , 1986 .

[8]  R. Patterson,et al.  Complex Sounds and Auditory Images , 1992 .

[9]  Hideki Kawahara,et al.  Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  H J McDermott,et al.  Improvements in speech perception with use of the AVR TranSonic frequency-transposing hearing aid. , 1999, Journal of speech, language, and hearing research : JSLHR.

[11]  T. M. Nearey,et al.  Identification of resynthesized /hVd/ utterances: effects of formant contour. , 1999, The Journal of the Acoustical Society of America.

[12]  P F Assmann,et al.  Time-varying spectral change in the vowels of children and adults. , 2000, The Journal of the Acoustical Society of America.

[13]  Peter F. Assmann,et al.  Identification of children's and adults' vowels: intrinsic fundamental frequency, fundamental frequency dynamics, and presence of voicing , 2001, J. Phonetics.