论文信息 - Maximum Likelihood Linear Regression 32 . 1 Maximum likelihood linear regression

Maximum Likelihood Linear Regression 32 . 1 Maximum likelihood linear regression

Maximum likelihood linear regression (MLLR) is an adaptation technique suitable for both speaker and environmental model-based adaptation. The models are adapted using a set of linear transformations, estimated in a maximum likelihood fashion from the available adaptation data. As these transformations can capture general relationships between the original model set and the current speaker, or new acoustic environment, they can be e ective in adapting all the HMM distributions with limited adaptation data. Two important decisions that must be made are (i) how to cluster components together, such that they all have a similar transformation matrix, and (ii) how many transformation matrices to generate for a given block of adaptation data. This paper addresses both problems. Firstly it describes two optimal clustering techniques, in the sense of maximising the likelihood of the adaptation data. The rst assigns each component to one of the regression classes. This may be used to generate standard regression class trees. The second scheme performs a fuzzy assignment of base class to regression class, so the transformation associated with each component is a linear combination of a set of transformations. Secondly two schemes are examined which address the problem of how to determine the number of regression classes, transforms, for a given amount of adaptation data. Two schemes are examined here. A cross-validation scheme based on the auxiliary function of the adaptation data is described. Another scheme based on the use of iterative MLLR is also detailed. Both these schemes require no a-priori thresholding information. An initial evaluation of the techniques was performed using data from the ARPA 1994 test data. On this task, though \good" trees, in terms of the likelihood of the adaptation training data were generated, neither of the optimal clustering schemes yielded gains in recognition performance. The performance of the cross-validation scheme was found to be comparable to an empirically determined threshold scheme. The best performance was achieved using iterative MLLR, which outperformed both xed classes and threshold based schemes.

M.J.F. Gales

[1] Vassilios Digalakis,et al. A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.

[2] Roger K. Moore,et al. Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[3] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[4] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5] S. J. Young,et al. Tree-based state tying for high accuracy acoustic modelling , 1994 .

[6] Mark J. F. Gales,et al. Iterative unsupervised adaptation using maximum likelihood linear regression , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7] Philip C. Woodland,et al. The development of the 1994 HTK large vocabulary speech recognition system , 1995 .

[8] Mark J. F. Gales,et al. Model-based techniques for noise robust speech recognition , 1995 .

[9] Philip C. Woodland,et al. Flexible speaker adaptation for large vocabulary speech recognition , 1995, EUROSPEECH.

[10] Chin-Hui Lee,et al. A study on speaker adaptation of continuous density HMM parameters , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11] Steve J. Young,et al. A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.