Study of Jacobian Normalization for VTLN

The divergence of the theory and practice of vocal tract length normalization (VTLN) is addressed, with particular emphasis on the role of the Jacobian determinant. VTLN is placed in a Bayesian setting, which brings in the concept of a prior on the warping factor. The form of the prior, together with acoustic scaling and numerical conditioning are then discussed and evaluated. It is concluded that the Jacobian determinant is important in VTLN, especially for the high dimensional features used in HMM based speech synthesis, and difficulties normally associated with the Jacobian determinant can be attributed to prior and scaling.

[1]  Michael Pitz,et al.  Investigations on linear transformations for speaker adaptation and normalization , 2005 .

[2]  Takao Kobayashi,et al.  A Study on Average Voice Model Training Using Vocal Tract Length Normalization , 2003 .

[3]  Abeer Alwan,et al.  Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC , 2009, Comput. Speech Lang..

[4]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[5]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.

[7]  Srinivasan Umesh,et al.  A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization , 2009, INTERSPEECH.

[8]  Giulia Garau,et al.  Speaker normalisation for large vocabulary multiparty conversational speech recognition , 2009 .

[9]  Hui Liang,et al.  VTLN adaptation for statistical speech synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Koichi Shinoda,et al.  Rapid vocal tract length normalization using maximum likelihood estimation , 2001, INTERSPEECH.

[11]  Hermann Ney,et al.  On the Estimation of 'Small' Probabilities by Leaving-One-Out , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Hermann Ney,et al.  Vocal tract normalization equals linear transformation in cepstral space , 2001, IEEE Transactions on Speech and Audio Processing.