A method for compensation of Jacobian in speaker normalization

In the conventional maximum likelihood based speaker normalization approach, the optimal frequency warping factors are estimated by maximizing the likelihood of warped features in a grid search. The conventional method of likelihood computation for warped features does not account for the Jacobian of the transformation. This fact is pointed out by some researchers who have also shown that frequency warping is equivalent to the transformation in the cepstral domain. As an approximation, variance normalization of cepstral features is used before likelihood computation to account for the Jacobian. In this paper, we suggest an alternate method to avoid the Jacobian problem. Our preliminary investigation shows that our proposed method provides improvement in normalization performance compared to the conventional method of warping factor estimation for a digit recognition task.

[1]  Srinivasan Umesh,et al.  A simple approach to non-uniform vowel normalization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[3]  William J. Byrne,et al.  Speaker normalization with all-pass transforms , 1998, ICSLP.

[4]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[5]  Philip C. Woodland,et al.  An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[6]  Srinivasan Umesh,et al.  Non-uniform scaling based speaker normalization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Leon Cohen,et al.  Frequency-warping in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.