Fast robust inverse transform speaker adapted training using diagonal transformations

We present a new method of speaker adapted training (SAT) that is more robust, faster, and results in lower error rate than the previous methods. The method, called inverse transform SAT (IT-SAT) is based on removing the differences between speakers before training, rather than modeling the differences during training. We develop several methods to avoid the problems associated with inverting the transformation. In one method, we interpolate the transformation matrix with an identity or diagonal transformation. We also apply constraints to the matrix to avoid estimation problems. Finally, we show that the resulting method is much faster, requires much less disk space, and results in higher accuracy than the original SAT method.