A frequency warping approach for vocal tract length normalization

A method of vocal tract length normalization (VTLN) is proposed. It uses bilinear transform (BLT) to modify the filterbank in Mel-frequency cepstrum based on the average third formant F/sub 3/. The effectiveness of this method is examined on vowel and isolated digit recognitions. The baseline vowel recognition models are trained on males data and the baseline isolated digit models are trained on adult men's data respectively. When the MFCC coefficients of test data are transformed by BLT, the recognition accuracy of females' vowels is improved by 11.67% and the recognition accuracies of adult women and children's isolated digits are improved by 19.5% and 13% respectively.

[1]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[3]  Hermann Ney,et al.  Improved methods for vocal tract normalization , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Louis ten Bosch,et al.  A novel feature transformation for vocal tract length normalization in automatic speech recognition , 1998, IEEE Trans. Speech Audio Process..

[5]  Yoon Young Kim,et al.  A speech feature based on Bark frequency warping-the non-uniform linear prediction (NLP) cepstrum , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[6]  William J. Byrne,et al.  Speaker normalization with all-pass transforms , 1998, ICSLP.