Knowledge-Rich Model Transformations for Speaker Normalization in Speech Recognition

In this work we extend the test utterance adaptation technique used in vocal tract length normalization to a larger number of speaker characteristic features. We perform partially joint estimation of four features: the VTLN warping factor, the corner position of the piece-wise linear warping function, spectral tilt in voiced segments, and model variance scaling. In experiments on the Swedish PF-Star children database, joint estimation of warping factor and variance scaling lowers the recognition error rate compared to warping factor alone.

[1]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Krzysztof Marasek,et al.  SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation , 2002, LREC.

[3]  Daniel Elenius,et al.  Adaptation and normalization experiments in speech recognition for 4 to 8 year old children , 2005, INTERSPEECH.

[4]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[5]  Shrikanth S. Narayanan,et al.  Robust recognition of children's speech , 2003, IEEE Trans. Speech Audio Process..

[6]  Daniel Elenius,et al.  The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[7]  Abeer Alwan,et al.  Age-and Gender-Dependent Analysis of Voice Source Characteristics , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[9]  Daniel Elenius,et al.  Vocal tract length compensation in the signal and model domains in child speech recognition , 2007 .

[10]  Daniel Elenius,et al.  Investigating Explicit Model Transformations for Speaker Normalization , 2008 .

[11]  Steve Young,et al.  The HTK book , 1995 .