论文信息 - On Extending VTLN to Phoneme-specific Warping in Automatic Speech Recognition

On Extending VTLN to Phoneme-specific Warping in Automatic Speech Recognition

Phoneme- and formant-specific warping has been shown to decrease formant and cepstral mismatch. These findings have not yet been fully implemented in speech recognition. This paper discusses a few reasons how this can be. A small experimental study is also included where phoneme-independent warping is extended towards phoneme-specific warping. The results of this investigation did not show a significant decrease in error rate during recognition. This is also in line with earlier experiments of methods discussed in the paper.

Daniel Elenius | Mats Blomberg

[1] Alex Acero,et al. Spoken Language Processing , 2001 .

[2] Steve Young,et al. The HTK book , 1995 .

[3] Steve McLaughlin,et al. Cascade Prediction Filters With Adaptive Zeros to Track the Time-Varying Resonances of the Vocal Tract , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Hermann Ney,et al. Improved methods for vocal tract normalization , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5] Shrikanth S. Narayanan,et al. Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[6] Alexandros Potamianos,et al. Region-based vocal tract length normalization for ASR , 2008, INTERSPEECH.

[7] Eduardo Lleida,et al. Augmented state space acoustic decoding for modeling local variability in speech , 2005, INTERSPEECH.

[8] Daniel Elenius,et al. The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[9] Fabio Brugnara,et al. Improved automatic speech recognition through speaker normalization , 2006, Comput. Speech Lang..

[10] Shrikanth S. Narayanan,et al. Robust recognition of children's speech , 2003, IEEE Trans. Speech Audio Process..

[11] G. Fant. Non-uniform vowel normalization , 1975 .

[12] Li Lee,et al. Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13] Hermann Ney,et al. Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[14] Krzysztof Marasek,et al. SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation , 2002, LREC.

[15] Daniel Elenius,et al. Adaptation and normalization experiments in speech recognition for 4 to 8 year old children , 2005, INTERSPEECH.