Normalization of vowels by vocal-tract length and its application to vowel identification

A new approach to speech parameter normalization is presented in which no prior knowledge about the input speakers is required. The vocal-tract length and area function are first estimated from the acoustic speech waveform, and then the area function is normalized to an acoustic tube of the same shape having a certain reference length. The normalized formant frequencies are defined as the resonance frequencies of this acoustic tube. The distributions of unnormalized and normalized formant frequencies for 9 stationary American vowels were investigated with 14 male and 12 female speakers. Fairly compact distributions of the vowels in the normalized F 1 -F 2 -F 3 space were obtained. A preliminary identification test for stationary vowels based on this normalization method showed an expected average recognition rate of 84-96 percent for arbitrarily selected speakers, depending on the phonetic criteria adopted for defining "correct" identification.