Acoustic class specific VTLN-warping using regression class trees

In this paper, we study the use of different frequency warpfactors for different acoustic classes in a computationally efficient frame-work of Vocal Tract Length Normalization (VTLN). This is motivated by the fact that all acoustic classes do not exhibit similar spectral variations as a result of physiological differences in vocal tract, and therefore, the use of a single frequency-warp for the entire utterance may not be appropriate. We have recently proposed a VTLN method that implements VTLN-warping through a linear-transformation (LT) of the conventional MFCC features and efficiently estimates the warp-factor using the same sufficient statistics as that are used in CMLLR adaptation. In this paper we have shown that, in this framework of VTLN, and using the idea of regression class tree, we can obtain separate VTLN-warping for different acoustic classes. The use of regression class tree ensures that warp-factor is estimated for each class even when there is very little data available for that class. The acoustic classes, in general, can be any collection of the Gaussian components in the acoustic model. We have built acoustic classes by using data-driven approach and by using phonetic knowledge. Using WSJ database we have shown the recognition performance of the proposed acoustic class specific warp-factor both for the data driven and the phonetic knowledge based regression class tree definitions and compare it with the case of the single warp-factor. Index Terms: VTLN, Acoustic-Class Specific Warping, Regression Class Tree, Linear Transform

[1]  Sankaran Panchapagesan Frequency warping by linear transformation of standard MFCC , 2006, INTERSPEECH.

[2]  Eduardo Lleida,et al.  Augmented state space acoustic decoding for modeling local variability in speech , 2005, INTERSPEECH.

[3]  Stephen Cox Speaker Normalisation in the MFCC Domain , 2000 .

[4]  Stephen Cox Speaker normalization in the MFCC domain , 2000, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  The generation and use of regression class trees for MLLR adaptation , 1996 .

[6]  Hermann Ney,et al.  Implementing frequency-warping and VTLN through linear transformation of conventional MFCC , 2005, INTERSPEECH.

[7]  Srinivasan Umesh,et al.  Study of jacobian compensation using linear transformation of conventional MFCC for VTLN , 2008, INTERSPEECH.

[8]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[9]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[10]  Alexandros Potamianos,et al.  Region-based vocal tract length normalization for ASR , 2008, INTERSPEECH.

[11]  William J. Byrne,et al.  Speaker normalization with all-pass transforms , 1998, ICSLP.

[12]  Srinivasan Umesh,et al.  A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics , 2008, INTERSPEECH.

[13]  Louis ten Bosch,et al.  A novel feature transformation for vocal tract length normalization in automatic speech recognition , 1998, IEEE Trans. Speech Audio Process..