论文信息 - Fast robust inverse transform speaker adapted training using diagonal transformations

Fast robust inverse transform speaker adapted training using diagonal transformations

We present a new method of speaker adapted training (SAT) that is more robust, faster, and results in lower error rate than the previous methods. The method, called inverse transform SAT (IT-SAT) is based on removing the differences between speakers before training, rather than modeling the differences during training. We develop several methods to avoid the problems associated with inverting the transformation. In one method, we interpolate the transformation matrix with an identity or diagonal transformation. We also apply constraints to the matrix to avoid estimation problems. Finally, we show that the resulting method is much faster, requires much less disk space, and results in higher accuracy than the original SAT method.

Richard M. Schwartz | Spyridon Matsoukas | Francis Kubala | Hubert Jin

[1] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[2] Richard M. Schwartz,et al. The 1996 BBN BYBLOS HUB-4 Transcription System , 1996 .

[3] Philip C. Woodland,et al. Speaker adaptation of HMMs using linear regression , 1994 .

[4] Vassilios Digalakis,et al. A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.

[5] Richard M. Schwartz,et al. Practical Implementations of Speaker-Adaptive Training , 1997 .

[6] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .