Missing-data model of vowel identification.

Vowel identity correlates well with the shape of the transfer function of the vocal tract, in particular the position of the first two or three formant peaks. However, in voiced speech the transfer function is sampled at multiples of the fundamental frequency (F0), and the short-term spectrum contains peaks at those frequencies, rather than at formants. It is not clear how the auditory system estimates the original spectral envelope from the vowel waveform. Cochlear excitation patterns, for example, resolve harmonics in the low-frequency region and their shape varies strongly with F0. The problem cannot be cured by smoothing: lag-domain components of the spectral envelope are aliased and cause F0-dependent distortion. The problem is severe at high F0's where the spectral envelope is severely undersampled. This paper treats vowel identification as a process of pattern recognition with missing data. Matching is restricted to available data, and missing data are ignored using an F0-dependent weighting function that emphasizes regions near harmonics. The model is presented in two versions: a frequency-domain version based on short-term spectra, or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts for the relative F0-independency observed in vowel identification.

[1]  Grant Fairbanks,et al.  Recent Experimental Investigations of Vocal Pitch in Speech , 1940 .

[2]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[3]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[4]  K. Stevens,et al.  Reduction of Speech Spectra by Analysis‐by‐Synthesis Techniques , 1961 .

[5]  A. Slawson Vowel quality and musical timbre as functions of spectrum envelope and fundamental frequency. , 1968, The Journal of the Acoustical Society of America.

[6]  Winifred Strange,et al.  Consonant environment specifies vowel identity. , 1974, The Journal of the Acoustical Society of America.

[7]  P. Keating,et al.  Fundamental frequency in the speech of infants and children. , 1978, The Journal of the Acoustical Society of America.

[8]  L. Chistovich,et al.  The ‘center of gravity’ effect in vowel spectra and critical distance between the formants: Psychoacoustical study of the perception of vowel-like stimuli , 1979, Hearing Research.

[9]  B. Lindblom,et al.  Modeling the judgment of vowel quality differences. , 1981, The Journal of the Acoustical Society of America.

[10]  P. Lieberman,et al.  Fundamental frequency and vowel perception , 1981 .

[11]  H. Traunmüller Perceptual dimension of openness in vowels. , 1981, The Journal of the Acoustical Society of America.

[12]  L. F. Willems,et al.  Measurement of pitch in speech: an implementation of Goldstein's theory of pitch perception. , 1982, The Journal of the Acoustical Society of America.

[13]  J. T. Hogan,et al.  Vowel identification: orthographic, perceptual, and acoustic aspects. , 1982, The Journal of the Acoustical Society of America.

[14]  B. Moore,et al.  Frequency and intensity difference limens for harmonics within complex tones. , 1984, The Journal of the Acoustical Society of America.

[15]  Q Summerfield,et al.  Perceiving vowels from uniform spectra: Phonetic exploration of an auditory aftereffect , 1984, Perception & psychophysics.

[16]  C. J. Darwin,et al.  Which harmonics contribute to the estimation of first formant frequency? , 1985, Speech Commun..

[17]  L. A. Chistovich Central auditory processing of peripheral vowel spectra. , 1985, The Journal of the Acoustical Society of America.

[18]  B. Moore,et al.  Thresholds for the detection of inharmonicity in complex tones. , 1985, The Journal of the Acoustical Society of America.

[19]  T L Gottfried,et al.  Intelligibility of vowels sung by a countertenor. , 1986, The Journal of the Acoustical Society of America.

[20]  R B Gardner,et al.  Mistuning a harmonic of a vowel: grouping and phase effects on vowel quality. , 1986, The Journal of the Acoustical Society of America.

[21]  P F Assmann,et al.  Perception of front vowels: the role of harmonics in the first formant region. , 1987, The Journal of the Acoustical Society of America.

[22]  L H Carney,et al.  Effects of interaural time delays of noise stimuli on low-frequency cells in the cat's inferior colliculus. III. Evidence for cross-correlation. , 1987, Journal of neurophysiology.

[23]  James D. Miller Auditory‐perceptual interpretation of the vowel , 1987 .

[24]  Yoh'ichi Tohkura,et al.  A weighted cepstral distance measure for speech recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[25]  T. M. Nearey Static, dynamic, and relational properties in vowel perception. , 1989, The Journal of the Acoustical Society of America.

[26]  Q. Summerfield,et al.  Modeling the perception of concurrent vowels: vowels with the same fundamental frequency. , 1989, The Journal of the Acoustical Society of America.

[27]  H. Traunmüller A note on hidden factors in vowel perception experiments. , 1990, The Journal of the Acoustical Society of America.

[28]  M. S. Benolken,et al.  The effect of pitch‐related changes on the perception of sung vowels , 1990 .

[29]  S Hawkins,et al.  The influence of spectral prominence on perceived vowel quality. , 1990, The Journal of the Acoustical Society of America.

[30]  R. Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery. II: Phase sensitivity , 1991 .

[31]  C. M. Marin,et al.  Segregation of concurrent sounds. II: Effects of spectral envelope tracing, frequency modulation coherence, and frequency modulation width. , 1991, The Journal of the Acoustical Society of America.

[32]  P F Assmann,et al.  The Perception of Back Vowels: Centre of Gravity Hypothesis , 1991, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[33]  Ray Meddis,et al.  Virtual pitch and phase sensitivity of a computer model of the auditory periphery , 1991 .

[34]  C. Darwin,et al.  Grouping in pitch perception: effects of onset asynchrony and ear of presentation of a mistuned component. , 1992, The Journal of the Acoustical Society of America.

[35]  R Meddis,et al.  Modeling the identification of concurrent vowels with different fundamental frequencies. , 1992, The Journal of the Acoustical Society of America.

[36]  J Hillenbrand,et al.  Identification of steady-state vowels synthesized from the Peterson and Barney measurements. , 1993, The Journal of the Acoustical Society of America.

[37]  R L Diehl,et al.  Perception of vowel height: the role of F1-F0 distance. , 1994, The Journal of the Acoustical Society of America.

[38]  R D Patterson,et al.  The time course of auditory segregation: concurrent vowels that vary in duration. , 1995, The Journal of the Acoustical Society of America.

[39]  Roy D. Patterson,et al.  The stimulus duration required to identify vowels, their octave, and their pitch chroma , 1995 .

[40]  D. Kewley-Port,et al.  Fundamental frequency effects on thresholds for vowel formant discrimination. , 1994, The Journal of the Acoustical Society of America.

[41]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. II. Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the dominance region for pitch. , 1996, Journal of neurophysiology.

[42]  R. P. Fahey,et al.  Perception of back vowels: effects of varying F1 - F0 Bark distance. , 1994, The Journal of the Acoustical Society of America.

[43]  B. Delgutte,et al.  Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. , 1996, Journal of neurophysiology.

[44]  C. M. Marin,et al.  Concurrent vowel identification II: Effects of phase, harmonicity and task , 1997 .

[45]  A. Cheveigné Concurrent vowel identification. III. A neural model of harmonic interference cancellation , 1997 .

[46]  S. Shamma,et al.  Spectral-ripple representation of steady-state vowels in primary auditory cortex. , 1998, The Journal of the Acoustical Society of America.

[47]  A. Cheveigné Cancellation model of pitch perception. , 1998 .

[48]  W Strange,et al.  Dynamic specification of coarticulated German vowels: perceptual and acoustical studies. , 1998, The Journal of the Acoustical Society of America.