论文信息 - A basis method for robust estimation of constrained MLLR

A basis method for robust estimation of constrained MLLR

Constrained Maximum Likelihood Linear Regression (CMLLR) is a widely used speaker adaptation technique in which an affine transform of the features is estimated for each speaker. However, when the amount of speech data available is very small (e.g. a few seconds), it can be difficult to get sufficiently accurate estimates of the transform parameters. In this paper we describe a method of estimating CMLLR robustly from less data. We do this by representing the CMLLR transform matrix as a weighted sum over basis matrices, where the basis is constructed in such a way that the most important variation is concentrated in the leading coefficients. Depending on the amount of data available, we can choose to estimate a smaller or larger number of coefficients.

Kaisheng Yao | Daniel Povey | Daniel Povey | K. Yao

[1] Kai Feng,et al. A novel estimation of feature-space MLLR for full-covariance models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2] Vaibhava Goel,et al. Structuring linear transforms for adaptation using training time information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Florent Perronnin,et al. Very fast adaptation with a compact context-dependent eigenvoice model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4] Kaisheng Yao,et al. A basis representation of constrained MLLR transforms for robust adaptation , 2012, Comput. Speech Lang..

[5] K. Visweswariah,et al. MAXIMUM LIKELIHOOD TRAINING OF BASES FOR RAPID ADAPTATION , 2002 .

[6] Vassilios Digalakis,et al. Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[7] Lin-Shan Lee,et al. Fast speaker adaptation using eigenspace-based maximum likelihood linear regression , 2000, INTERSPEECH.

[8] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[9] Xiaodong He,et al. Robust feature space adaptation for telephony speech recognition , 2006, INTERSPEECH.

[10] Jing Huang,et al. Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition , 2005, 2005 IEEE International Conference on Multimedia and Expo.