论文信息 - Fast Robust Inverse Transform SAT and Multi-stage Adaptation

Fast Robust Inverse Transform SAT and Multi-stage Adaptation

We present a new method of Speaker Adapted Training (SAT) that is more robust, faster, and results in lower error rate than the previous methods. The method, called Inverse Transform SAT (ITSAT) is based on removing the di erences between speakers before training, rather than modeling the di erences during training. We develop several methods to avoid the problems associated with inverting the transformation. In one method, we interpolate the transformation matrix with an identity or diagonal transformation. We also apply constraints to the matrix to avoid estimation problems. We show that by using many diagonal-only transformation matrices with constraints we can achieve performance that is comparable to that of the original SAT method at a fraction of the cost. In addition, we describe a multi-stage approach to Maximum Likelihood Linear Regression (MLLR) unsupervised adaptation and we show that is more e ective than a single stage regular MMLR adaptation. As a nal stage, we adapt the resulting model at a ner resolution, using Maximum A Posteriori (MAP) adaptation. With the combination of all the above adaptation methods we obtain a 13.6% overall reduction in WER relative to Speaker Independent (SI) training and decoding.

Francis Kubala | Hubert Jin | Spyros Matsoukas | Richard Schwartz

[1] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[2] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[3] Richard M. Schwartz,et al. Practical Implementations of Speaker-Adaptive Training , 1997 .

[4] Philip C. Woodland,et al. Speaker adaptation of HMMs using linear regression , 1994 .

[5] Vassilios Digalakis,et al. A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.

[6] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[7] Richard M. Schwartz,et al. The 1996 BBN BYBLOS HUB-4 Transcription System , 1996 .