Identification of resynthesized /hVd/ utterances: effects of formant contour.

The purpose of this study was to examine the role of formant frequency movements in vowel recognition. Measurements of vowel duration, fundamental frequency, and formant contours were taken from a database of acoustic measurements of 1668 /hVd/ utterances spoken by 45 men, 48 women, and 46 children [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. A 300-utterance subset was selected from this database, representing equal numbers of 12 vowels and approximately equal numbers of tokens produced by men, women, and children. Listeners were asked to identify the original, naturally produced signals and two formant-synthesized versions. One set of "original formant" (OF) synthetic signals was generated using the measured formant contours, and a second set of "flat formant" (FF) signals was synthesized with formant frequencies fixed at the values measured at the steadiest portion of the vowel. Results included: (a) the OF synthetic signals were identified with substantially greater accuracy than the FF signals; and (b) the naturally produced signals were identified with greater accuracy than the OF synthetic signals. Pattern recognition results showed that a simple approach to vowel specification based on duration, steady-state F0, and formant frequency measurements at 20% and 80% of vowel duration accounts for much but by no means all of the variation in listeners' labeling of the three types of stimuli.

[1]  T. M. Nearey Static, dynamic, and relational properties in vowel perception. , 1989, The Journal of the Acoustical Society of America.

[2]  Maria-Gabriella Di Benedetto Vowel representation: Some observations on temporal and spectral properties of the first formant frequency , 1989 .

[3]  G. Fant,et al.  Two-formant Models, Pitch and Vowel Perception , 1975 .

[4]  Terrance M. Nearey,et al.  Modeling the role of inherent spectral change in vowel identification , 1986 .

[5]  D. C. Bennett,et al.  Spectral Form and Duration as Cues in the Recognition of English and German Vowels , 1968, Language and speech.

[6]  H. S. Gopal,et al.  A perceptual model of vowel recognition based on the auditory representation of American English vowels. , 1986, The Journal of the Acoustical Society of America.

[7]  J. C. Steinberg,et al.  Toward the Specification of Speech , 1950 .

[8]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .

[9]  T. M. Nearey,et al.  Speech perception as pattern recognition. , 1997, The Journal of the Acoustical Society of America.

[10]  J. T. Hogan,et al.  Vowel identification: orthographic, perceptual, and acoustic aspects. , 1982, The Journal of the Acoustical Society of America.

[11]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[12]  Lawrence R. Rabiner Digital Formant Synthesizer for Speech Synthesis Studies , 1967 .

[13]  James J. Jenkins,et al.  Dynamic specification of coarticulated vowels , 1983 .

[14]  James D. Miller Auditory‐perceptual interpretation of the vowel , 1987 .

[15]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[16]  W. R. Tiffany Vowel recognition as a function of duration, frequency modulation and phonetic context. , 1953, The Journal of speech and hearing disorders.

[17]  G. E. Peterson,et al.  The Phonetic Value of Vowels , 1951 .

[18]  J. Jenkins,et al.  Identification of vowels in “vowelless” syllables , 1983, Perception & psychophysics.

[19]  James Hillenbrand,et al.  Vowel Classification Based on Fundamental Frequency and Formant Frequencies , 1993 .

[20]  R L Diehl,et al.  Identifying vowels in CVC syllables: effects of inserting silence and noise. , 1981, Perception & psychophysics.

[21]  Randy L. Diehl,et al.  Identifying vowels in CVC syllables: Effects of inserting silence and noise , 1984 .

[22]  Terrance M. Nearey,et al.  Speech signals, cues, and features , 1979 .

[23]  T. M. Nearey,et al.  Effects of consonant environment on vowel formant patterns. , 1997, The Journal of the Acoustical Society of America.

[24]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[25]  T. M. Nearey,et al.  On the sufficiency of compound target specification of isolated vowels and vowels in /bVb/ syllables. , 1992, The Journal of the Acoustical Society of America.

[26]  L R Rabiner,et al.  Digital-formant synthesizer for speech-synthesis studies. , 1968, The Journal of the Acoustical Society of America.

[27]  J Harrington,et al.  Acoustic evidence for dynamic formant trajectories in Australian English vowels. , 1999, The Journal of the Acoustical Society of America.

[28]  G. Fairbanks,et al.  A psychophysical investigation of vowel formants. , 1961, Journal of speech and hearing research.

[29]  K. Stevens,et al.  Perturbation of vowel articulations by consonantal context: an acoustical study. , 1963, Journal of speech and hearing research.

[30]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[31]  John W. Black,et al.  Natural Frequency, Duration, and Intensity of Vowels in Reading , 1949 .

[32]  B. Lindblom,et al.  Modeling the judgment of vowel quality differences. , 1981, The Journal of the Acoustical Society of America.

[33]  W. A. Ainsworth,et al.  Duration as a Cue in the Recognition of Synthetic Vowels , 1972 .

[34]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[35]  Terrance M. Nearey Applications of generalized linear modeling to vowel data , 1992, ICSLP.

[36]  J D Miller Auditory processing of the acoustic patterns of speech. , 1984, Archives of otolaryngology.

[37]  J Hillenbrand,et al.  Identification of steady-state vowels synthesized from the Peterson and Barney measurements. , 1993, The Journal of the Acoustical Society of America.