Non-uniform scaling based speaker normalization

We present experimental results that show better speaker nonnalization using our previously reported frequency warping function that is derived purely from speech data. In our previous work, we have numerically computed the frequency warping function for non-uniform scaling, which is similar to mel-scale, such that spectral envelopes from different speakers enunciating the same sound are similar except for a possible translation factor. In this paper, we do a maximum likelihood search for these translation parameters and show that this non-uniform normalization scheme provides about 18 % improvement over the normalization method based on the maximum likelihood estimate of uniform scaling parameters and about 30 % improvement over mel filterbank cepstral coefficient based baseline for a telephone based continuous digit recognition task. The other attractive attribute of the proposed method is the simplicity in generating features with different shifts compared to generating features with different warping factors in earlier methods.

[1]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[2]  A.H. Nuttall,et al.  Spectral estimation using combined time and lag weighting , 1982, Proceedings of the IEEE.

[3]  Puming Zhan,et al.  Speaker normalization based on frequency warping , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Leon Cohen,et al.  Frequency-warping in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Srinivasan Umesh,et al.  A simple approach to non-uniform vowel normalization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Yunxin Zhao,et al.  Speaker normalization using constrained spectra shifts in auditory filter domain , 1993, EUROSPEECH.

[8]  Mark A. Fanty,et al.  Rapid unsupervised adaptation to children's speech on a connected-digit task , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Douglas J. Nelson,et al.  Warping functions in speech , 1998, Optics & Photonics.