Fast speaker adaptation of large vocabulary continuous density HMM speech recognizer using a basis transform approach

Maximum likelihood transformation-adaptation techniques have proven successful, but it is believed that faster convergence to speaker dependent (SD) performance can be achieved if we incorporate some form of a-priori knowledge in the adaptation process. In this paper, instead of estimating one linear transform per class of models for each new speaker, we transform the speaker-independent (SI) models using multiple linear transforms and a weight vector. To reduce the number of adaptation parameters, the multiple linear transforms are generated from training speakers and the adaptation parameters consist of a single weight vector per class. This can be seen as incorporating a-priori knowledge to our estimation process. Experiments conducted on the Spoken Language Translator database in the Swedish language using SRI's DECIPHER/sup TM/ system, show that the new method outperforms maximum likelihood linear regression on very limited adaptation data.