A basis method for robust estimation of constrained MLLR

Constrained Maximum Likelihood Linear Regression (CMLLR) is a widely used speaker adaptation technique in which an affine transform of the features is estimated for each speaker. However, when the amount of speech data available is very small (e.g. a few seconds), it can be difficult to get sufficiently accurate estimates of the transform parameters. In this paper we describe a method of estimating CMLLR robustly from less data. We do this by representing the CMLLR transform matrix as a weighted sum over basis matrices, where the basis is constructed in such a way that the most important variation is concentrated in the leading coefficients. Depending on the amount of data available, we can choose to estimate a smaller or larger number of coefficients.

[1]  Kai Feng,et al.  A novel estimation of feature-space MLLR for full-covariance models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Vaibhava Goel,et al.  Structuring linear transforms for adaptation using training time information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Florent Perronnin,et al.  Very fast adaptation with a compact context-dependent eigenvoice model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  Kaisheng Yao,et al.  A basis representation of constrained MLLR transforms for robust adaptation , 2012, Comput. Speech Lang..

[5]  K. Visweswariah,et al.  MAXIMUM LIKELIHOOD TRAINING OF BASES FOR RAPID ADAPTATION , 2002 .

[6]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[7]  Lin-Shan Lee,et al.  Fast speaker adaptation using eigenspace-based maximum likelihood linear regression , 2000, INTERSPEECH.

[8]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[9]  Xiaodong He,et al.  Robust feature space adaptation for telephony speech recognition , 2006, INTERSPEECH.

[10]  Jing Huang,et al.  Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.