Mean and variance adaptation within the MLLR framework

Abstract One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker-dependent (SD) performance with only small amounts of speaker-specific data, and are often based on initial speaker-independent (SI) recognition systems. Some of these speaker adaptation techniques may also be applied to the task of adaptation to a new acoustic environment. In this case an SI recognition system trained in, typically, a clean acoustic environment is adapted to operate in a new, noise-corrupted, acoustic environment. This paper examines the maximum likelihood linear regression (MLLR) adaptation technique. MLLR estimates linear transformations for groups of model parameters to maximize the likelihood of the adaptation data. Previously, MLLR has been applied to the mean parameters in mixture-Gaussian HMM systems. In this paper MLLR is extended to also update the Gaussian variances and re-estimation formulae are derived for these variance transforms. MLLR with variance compensation is evaluated on several large vocabulary recognition tasks. The use of mean and variance MLLR adaptation was found to give an additional 2% to 7% decrease in word error rate over mean-only MLLR adaptation.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  Johannes Jaschul,et al.  Speaker adaptation by a linear transformation with optimised parameters , 1982, ICASSP.

[3]  Andrew John Hewett,et al.  Training and speaker adaptation in template-based speech recognition , 1989 .

[4]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Chin-Hui Lee,et al.  A study on speaker adaptation of continuous density HMM parameters , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  Steve J. Young,et al.  A One Pass Decoder Design For Large Vocabulary Recognition , 1994, HLT.

[7]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[8]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[9]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[10]  Douglas A. Reynolds,et al.  Integrated models of signal and background with application to speaker identification in noise , 1994, IEEE Trans. Speech Audio Process..

[11]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[12]  Philip C. Woodland,et al.  The development of the 1994 HTK large vocabulary speech recognition system , 1995 .

[13]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[14]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[15]  Vassilios Digalakis,et al.  A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.

[16]  Philip C. Woodland,et al.  Flexible speaker adaptation for large vocabulary speech recognition , 1995, EUROSPEECH.

[17]  Mark J. F. Gales,et al.  A fast and flexible implementation of parallel model combination , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Mark J. F. Gales,et al.  Improving environmental robustness in large vocabulary speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[19]  Mark J. F. Gales,et al.  Variance compensation within the MLLR framework , 1996 .