On Extending VTLN to Phoneme-specific Warping in Automatic Speech Recognition

Phoneme- and formant-specific warping has been shown to decrease formant and cepstral mismatch. These findings have not yet been fully implemented in speech recognition. This paper discusses a few reasons how this can be. A small experimental study is also included where phoneme-independent warping is extended towards phoneme-specific warping. The results of this investigation did not show a significant decrease in error rate during recognition. This is also in line with earlier experiments of methods discussed in the paper.

[1]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[2]  Steve Young,et al.  The HTK book , 1995 .

[3]  Steve McLaughlin,et al.  Cascade Prediction Filters With Adaptive Zeros to Track the Time-Varying Resonances of the Vocal Tract , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Hermann Ney,et al.  Improved methods for vocal tract normalization , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[5]  Shrikanth S. Narayanan,et al.  Creating conversational interfaces for children , 2002, IEEE Trans. Speech Audio Process..

[6]  Alexandros Potamianos,et al.  Region-based vocal tract length normalization for ASR , 2008, INTERSPEECH.

[7]  Eduardo Lleida,et al.  Augmented state space acoustic decoding for modeling local variability in speech , 2005, INTERSPEECH.

[8]  Daniel Elenius,et al.  The PF_STAR children's speech corpus , 2005, INTERSPEECH.

[9]  Fabio Brugnara,et al.  Improved automatic speech recognition through speaker normalization , 2006, Comput. Speech Lang..

[10]  Shrikanth S. Narayanan,et al.  Robust recognition of children's speech , 2003, IEEE Trans. Speech Audio Process..

[11]  G. Fant Non-uniform vowel normalization , 1975 .

[12]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[14]  Krzysztof Marasek,et al.  SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation , 2002, LREC.

[15]  Daniel Elenius,et al.  Adaptation and normalization experiments in speech recognition for 4 to 8 year old children , 2005, INTERSPEECH.