Modeling the perception of concurrent vowels: Role of formant transitions.
暂无分享,去创建一个
When two synthetic vowels are presented concurrently and monaurally, listeners identify the members of the pair more accurately if they differ in fundamental frequency (F0), or if one of them is preceded or followed by formant transitions that specify a glide or liquid consonant. However, formant transitions do not help listeners identify the vowel to which they are linked; instead, they make the competing vowel easier to identify. One explanation is that the formant transition region provides a brief time interval during which the competing vowel is perceptually more prominent. This interpretation is supported by the predictions of two computational models of the identification of concurrent vowels that (i) perform a frequency analysis using a bank of bandpass filters, (ii) analyze the waveform in each channel using a brief, sliding temporal window, and (iii) determine which region of the signal provides the strongest evidence of each vowel. Model A [Culling and Darwin, J. Acoust. Soc. Am. 95, 1559-1569 (1994)] computes the rms energy in each channel at successive time intervals to generate running excitation patterns that serve as input to a vowel classifier, implemented as a linear associative neural network. Model B uses a temporal analysis in each channel to generate running autocorrelation functions, and it includes a further stage of source segregation [Meddis and Hewitt, J. Acoust. Soc. Am. 91, 233-245 (1992)] to partition the channels into two groups, one group providing evidence of the periodicity of the vowel with the dominant F0, the other group providing evidence of the competing vowel. Both models predicted effects of F0 and formant transitions on identification, but model B provided more accurate predictions of the pattern of listeners' identification responses. Taken together, the empirical and modeling results support the idea that the identification of concurrent vowels involves an analysis of the composite waveform using a sliding temporal window, combined with a form of F0-guided source segregation.