VTLN in the MFCC Domain: Band-Limited versus Local Interpolation

We propose a new easy-to-implement method to compute a Linear Transform (LT) to perform Vocal Tract Length Normalization (VTLN) on truncated Mel Frequency Cepstral Coefficients (MFCCs) normally used in distributed speech recognition. The method is based on a Local Interpolation which is independent of the Mel filter design. Local Interpolation (LILT) VTLN is theoretically and experimentally compared to a global scheme based on band-limited interpolation (BLI-VTLN) and the conventional frequency warping scheme (FFT-VTLN). Investigating the interoperability of these methods shows that the performance of LILT-VTLN is on par with FFT-VTLN and BLI-VTLN. Models trained with LILTand BLIVTLN performance degrades if FFT-VTLN is used as a front-end. The degradation for LILT-VTLN is slightly less, indicating that it produces models that are a better match for FFT-VTLN.

[1]  Hermann Ney,et al.  Revisiting VTLN using linear transformation on conventional MFCC , 2010, INTERSPEECH.

[2]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.

[3]  William J. Byrne,et al.  Speaker normalization with all-pass transforms , 1998, ICSLP.

[4]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[5]  Abeer Alwan,et al.  Adaptation of children's speech with limited data based on formant-like peak alignment , 2006, Comput. Speech Lang..

[6]  Louis ten Bosch,et al.  A novel feature transformation for vocal tract length normalization in automatic speech recognition , 1998, IEEE Trans. Speech Audio Process..

[7]  Srinivasan Umesh,et al.  A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics , 2008, INTERSPEECH.

[8]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[9]  Florian Metze,et al.  Analysis of gender normalization using MLP and VTLN features , 2010, INTERSPEECH.

[10]  Hermann Ney,et al.  Implementing frequency-warping and VTLN through linear transformation of conventional MFCC , 2005, INTERSPEECH.

[11]  Srinivasan Umesh,et al.  Study of jacobian compensation using linear transformation of conventional MFCC for VTLN , 2008, INTERSPEECH.

[12]  Abeer Alwan,et al.  Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC , 2009, Comput. Speech Lang..

[13]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.