The influence of spectral prominence on perceived vowel quality.

Research indicates that, when the first and second formants of a vowel are separated by less than about 3.5 Bark, perception of its height and some other aspects of its quality is determined by some weighted average of the low-frequency spectrum, rather than by particular harmonic or hypothetical formant frequencies (as is the case with more widely spaced formants). This spectral averaging has been called the center of gravity (COG) effect. Although the existence of the effect is generally accepted, the factors that govern it are poorly understood. One possibility is that the influence of the spectral envelope on perceived vowel quality increases as low-frequency spectral prominences become less well defined. A series of three experiments examined this possibility in: (1) nasal vowels, where the lowest spectral prominence is broader and flatter than that of oral vowels; (2) one- versus two-formant vowels with bandwidths appropriate for oral vowels; and (3) two-formant vowels with very narrow or very wide bandwidths. The results of these experiments show that, when two or more spectral peaks lie within 3.5 Bark of one another, F1 and the centroid (an amplitude-weighted average frequency that estimates the COG in the low-frequency spectrum) roughly determine the boundaries within which the perceptual COG lies; the frequencies of spectral peaks dominate responses when formant bandwidths are narrow, whereas overall spectral shape exerts more influence when spectral prominences are wide. Assuming that all vowels undergo the same processing, it is suggested that vowel quality, particularly height, is determined both by the frequency of the most prominent harmonics in the low-frequency region and by the slopes of the skirts in the vicinity of these harmonics. These two effects are most clearly separable in vowels with poorly defined spectral prominences whose shape cannot be adequately described by specifying the frequencies and degree of prominence of just one or two harmonics, or hypothetical formant peaks.

[1]  Anthony Bladon Two-formant models of vowel perception: Shortcomings and enhancement , 1983, Speech Commun..

[2]  J B Millar,et al.  The Effect of Relative Formant Amplitude on the Perceived Identity of Synthetic Vowels , 1972, Language and speech.

[3]  T. Houtgast Auditory-filter characteristics derived from direct-masking data and pulsation-threshold data with a rippled-noise masker. , 1977, The Journal of the Acoustical Society of America.

[4]  A cross-language study of vowel nasalization , 1972 .

[5]  S Hawkins,et al.  Acoustic and perceptual correlates of the non-nasal--nasal distinction for vowels. , 1985, The Journal of the Acoustical Society of America.

[6]  Hartmut Traunmüller,et al.  Perception of timbre: : evidence for spectral resloution bandwidth different from critical band? , 1982 .

[7]  M. Sachs,et al.  Effects of nonlinearities on speech encoding in the auditory nerve. , 1979, The Journal of the Acoustical Society of America.

[8]  A cross‐language study of vowel spaces and interference , 1987 .

[9]  A. Liberman,et al.  An Experimental Study of the Acoustic Determinants of Vowel Color; Observations on One- and Two-Formant Vowels Synthesized from Spectrographic Patterns , 1952 .

[10]  H. Fujisaki,et al.  The roles of pitch and higher formants in the perception of vowels , 1968 .

[11]  B. Lindblom,et al.  Modeling the judgment of vowel quality differences. , 1981, The Journal of the Acoustical Society of America.

[12]  D. Shankweiler,et al.  What information enables a listener to map a talker's vowel space? , 1974, The Journal of the Acoustical Society of America.

[13]  K. Stevens,et al.  Some Acoustical and Perceptual Correlates of Nasal Vowels , 1987 .

[14]  A shift in formant frequencies is not the same as a shift in the center of gravity of a multiformant energy concentration , 1985 .

[15]  R. Miller Auditory Tests with Synthetic Vowels , 1951 .

[16]  G. Fant,et al.  Two-formant Models, Pitch and Vowel Perception , 1975 .

[17]  D. Shankweiler,et al.  What information enables a listener to map a talker's vowel space? , 1976, The Journal of the Acoustical Society of America.

[18]  L. A. Chistovich Central auditory processing of peripheral vowel spectra. , 1985, The Journal of the Acoustical Society of America.

[19]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[20]  Quentin Summerfield,et al.  The effect of enhanced spectral contrast on the internal representation of vowel-shaped noise , 1985 .

[21]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[22]  P F Assmann,et al.  Perception of front vowels: the role of harmonics in the first formant region. , 1987, The Journal of the Acoustical Society of America.

[23]  K. Stevens,et al.  Analog studies of the nasalization of vowels. , 1956, The Journal of speech and hearing disorders.

[24]  P. Delattre Les Attributs Acoustiques De La Na‐Salité Vocalique Et Consonantique , 1954 .

[25]  D. Klatt Prediction of perceived phonetic distance from short‐term spectra—a first step , 1981 .

[26]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[27]  Kuldip K. Paliwal,et al.  A study of two-formant models for vowel identification , 1983, Speech Commun..

[28]  B C Moore,et al.  Auditory filter shapes derived in simultaneous and forward masking. , 1981, The Journal of the Acoustical Society of America.

[29]  Louis Goldstein,et al.  Perceptual constraints and phonological change: a study of nasal vowel height , 1986, Phonology.

[30]  C. J. Darwin,et al.  Which harmonics contribute to the estimation of first formant frequency? , 1985, Speech Commun..

[31]  H. S. Gopal,et al.  A perceptual model of vowel recognition based on the auditory representation of American English vowels. , 1986, The Journal of the Acoustical Society of America.

[32]  H. Traunmüller Perceptual dimension of openness in vowels. , 1981, The Journal of the Acoustical Society of America.

[33]  O. Fujimura,et al.  Sweep-tone measurements of vocal-tract characteristics. , 1971, The Journal of the Acoustical Society of America.

[34]  L. Chistovich,et al.  The ‘center of gravity’ effect in vowel spectra and critical distance between the formants: Psychoacoustical study of the perception of vowel-like stimuli , 1979, Hearing Research.

[35]  Ann K. Syrdal,et al.  Aspects of a model of the auditory representation of american english vowels , 1985, Speech Commun..

[36]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[37]  B. Delgutte Speech coding in the auditory nerve: II. Processing schemes for vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.