Fast Robust Inverse Transform SAT and Multi-stage Adaptation

We present a new method of Speaker Adapted Training (SAT) that is more robust, faster, and results in lower error rate than the previous methods. The method, called Inverse Transform SAT (ITSAT) is based on removing the di erences between speakers before training, rather than modeling the di erences during training. We develop several methods to avoid the problems associated with inverting the transformation. In one method, we interpolate the transformation matrix with an identity or diagonal transformation. We also apply constraints to the matrix to avoid estimation problems. We show that by using many diagonal-only transformation matrices with constraints we can achieve performance that is comparable to that of the original SAT method at a fraction of the cost. In addition, we describe a multi-stage approach to Maximum Likelihood Linear Regression (MLLR) unsupervised adaptation and we show that is more e ective than a single stage regular MMLR adaptation. As a nal stage, we adapt the resulting model at a ner resolution, using Maximum A Posteriori (MAP) adaptation. With the combination of all the above adaptation methods we obtain a 13.6% overall reduction in WER relative to Speaker Independent (SI) training and decoding.