Real-Time Vocal Tract Length Normalization

Speaker normalization in a speech recognition can significantly improve speech recognition accuracy. One such method, vocal tract length normalization (VTLN), is especially useful when the system has to work reliably for males, females and children. It is just this situation with our phonological awareness teaching system, the “SpeechMaster”, which aims at real-time phoneme recognition and feedback. As most VTLN algorithms work off-line, this poses the additional problem of real-time operation. This paper examines how a well-known off-line algorithm can be approximated on-line by machine learning regression techniques. We conclude that, by employing a real-time estimation of VTLN parameters, the recognition error can be reduced by some 14-24 %.

[1]  Puming Zhan,et al.  Speaker normalization based on frequency warping , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[3]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Tanja Schultz,et al.  Linear discriminant - a new criterion for speaker normalization , 1998, ICSLP.

[5]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[6]  Louis ten Bosch,et al.  A novel feature transformation for vocal tract length normalization in automatic speech recognition , 1998, IEEE Trans. Speech Audio Process..

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  S. Wegmann,et al.  Speaker normalization on conversational telephone speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.