论文信息 - Knowledge-Rich Model Transformations for Speaker Normalization in Speech Recognition

Knowledge-Rich Model Transformations for Speaker Normalization in Speech Recognition

In this work we extend the test utterance adaptation technique used in vocal tract length normalization to a larger number of speaker characteristic features. We perform partially joint estimation of four features: the VTLN warping factor, the corner position of the piece-wise linear warping function, spectral tilt in voiced segments, and model variance scaling. In experiments on the Swedish PF-Star children database, joint estimation of warping factor and variance scaling lowers the recognition error rate compared to warping factor alone.

Daniel Elenius | Mats Blomberg

[1] Li Lee,et al. Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2] Krzysztof Marasek,et al. SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation , 2002, LREC.

[3] Daniel Elenius,et al. Adaptation and normalization experiments in speech recognition for 4 to 8 year old children , 2005, INTERSPEECH.

[4] Chin-Hui Lee,et al. A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[5] Shrikanth S. Narayanan,et al. Robust recognition of children's speech , 2003, IEEE Trans. Speech Audio Process..

[6] Daniel Elenius,et al. The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[7] Abeer Alwan,et al. Age-and Gender-Dependent Analysis of Voice Source Characteristics , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8] Hermann Ney,et al. Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[9] Daniel Elenius,et al. Vocal tract length compensation in the signal and model domains in child speech recognition , 2007 .

[10] Daniel Elenius,et al. Investigating Explicit Model Transformations for Speaker Normalization , 2008 .

[11] Steve Young,et al. The HTK book , 1995 .