Do 'Dominant Frequencies' explain the listener's response to formant and spectrum shape variations?

Psychoacoustic experimentation shows that formant frequency shifts can give rise to more significant changes in phonetic vowel timber than differences in overall level, bandwidth, spectral tilt, and formant amplitudes. Carlson and Granström's perceptual and computational findings suggest that, in addition to spectral representations, the human ear uses temporal information on formant periodicities ('Dominant Frequencies') in building vowel timber percepts. The availability of such temporal coding in the cat's auditory nerve fibers has been demonstrated in numerous physiological investigations undertaken during recent decades. In this paper we explore, and provide further support for, the Dominant Frequency hypothesis using KONVERT, a computational auditory model. KONVERT provides auditory excitation patterns for vowels by performing a critical-band analysis. It simulates phase locking in auditory neurons and outputs DF histograms. The modeling supports the assumption that listeners judge phonetic distance among vowels on the basis formant frequency differences as determined primarily by a time-based analysis. However, when instructed to judge psychophysical distance among vowels, they can also use spectral differences such as formant bandwidth, formant amplitudes and spectral tilt. Although there has been considerable debate among psychoacousticians about the functional role of phase locking in monaural hearing, the present research suggests that detailed temporal information may nonetheless play a significant role in speech perception.

[1]  Johan Liljencrants,et al.  Formant‐Amplitude Measurements , 1963 .

[2]  E. Zwicker,et al.  Das Ohr als Nachrichtenempfänger , 1967 .

[3]  Björn Lindblom,et al.  Frontiers of speech communication research , 1979 .

[4]  B. Lindblom,et al.  Modeling the judgment of vowel quality differences. , 1981, The Journal of the Acoustical Society of America.

[5]  B. Lindblom Phonetic Universals in Vowel Systems , 1986 .

[6]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with the same fundamental frequency. , 1989, The Journal of the Acoustical Society of America.

[7]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.

[8]  R. P. Fahey,et al.  On explaining certain male-female differences in the phonetic realization of vowel categories , 1996 .

[9]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[10]  Mark A. Stellmack,et al.  PSYCHOPHYSICAL AND PHYSIOLOGICAL ASPECTS OF AUDITORY TEMPORAL PROCESSING , 2002 .

[11]  Rolf Carlson,et al.  Auditory models in isolated word recognition , 1984, ICASSP.

[12]  Steven Greenberg,et al.  Acoustic transduction in the auditory periphery , 1988 .

[13]  J. E. Rose,et al.  Phase-locked response to low-frequency tones in single auditory nerve fibers of the squirrel monkey. , 1967, Journal of neurophysiology.

[14]  G. Fant,et al.  Two-formant Models, Pitch and Vowel Perception , 1975 .

[15]  B. Delgutte,et al.  Speech coding in the auditory nerve: I. Vowel-like sounds. , 1984, The Journal of the Acoustical Society of America.

[16]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[17]  J E Hind,et al.  Some possible neural correlates of combination tones. , 1969, Journal of neurophysiology.

[18]  B. Lindblom,et al.  Numerical Simulation of Vowel Quality Systems: The Role of Perceptual Contrast , 1972 .

[19]  P. E. Stopp Frequency analysis and periodicity detection in hearing 1971, Plomp and Smoorenburg (Editors). Leiden, Netherlands: Sijthoff Leiden. Cloth, Fl. 60 , 1971 .

[20]  H. Fletcher,et al.  Loudness, its definition, measurement and calculation. , 1933 .

[21]  G. Fant,et al.  Auditory analysis and perception of speech , 1975 .