Study Of Non-Linear Frequency Warping Functions For Speaker Normalization

In this paper, we study non-linear frequency-warping functions that are commonly used in speaker normalization. This study is motivated by our recently proposed affine transformation model for speaker normalization which has provided improved recognition performance when compared to uniform scaling model. In this work, using formant data from Peterson & Barney and Hillenbrand vowel databases, we analyze the behavior of scale factor as a function of frequency. The empirical observation shows that while uniform scaling assumption may be valid at higher frequencies, there are significant deviations at low frequencies. We show that while our recently proposed model has behavior similar to the empirical result, the behavior of many of the commonly used non-linear models (including that of Eide-Gish, power law and bilinear transformation) differ significantly from the empirical result. This difference in behavior from the empirical observation may explain the limited improvement in recognition performance provided by these non-linear models when compared to conventional uniform-scaling model. We also show that our proposed model does better fitting to the formant data than these non-linear models. We, therefore, conclude that the affine-transformation model may be a more appropriate non-linear model for speaker normalization

[1]  S. V. Bharath Kumar Uniform speaker normalization using frequency-dependent scaling function , 2004 .

[2]  William J. Byrne,et al.  Speaker normalization with all-pass transforms , 1998, ICSLP.

[3]  Srinivasan Umesh,et al.  Non-uniform scaling based speaker normalization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[5]  Srinivasan Umesh,et al.  Non-uniform speaker normalization using affine-transformation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  S. Umesh,et al.  Frequency warping and the Mel scale , 2002, IEEE Signal Processing Letters.

[7]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Richard M. Stern,et al.  Robust speech recognition by normalization of the acoustic space , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  G. Fant Non-uniform vowel normalization , 1975 .

[10]  S.V. Bharath Kumar Uniform speaker normalization using frequency-dependent scaling function , 2004, 2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04..

[11]  Srinivasan Umesh,et al.  A simple approach to non-uniform vowel normalization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mark A. Fanty,et al.  Rapid unsupervised adaptation to children's speech on a connected-digit task , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.