Maximum-likelihood stochastic-transformation adaptation of hidden Markov models

The recognition accuracy in previous large vocabulary automatic speech recognition (ASR) systems is highly related to the existing mismatch between the training and testing sets. For example, dialect differences across the training and testing speakers result in a significant degradation in recognition performance. Some popular adaptation approaches improve the recognition performance of speech recognizers based on hidden Markov models with continuous mixture densities by using linear transformations to adapt the means, and possibly the covariances of the mixture Gaussians. The linear assumption, however, is too restrictive, and in this paper we propose a novel adaptation technique that adapts the means and, optionally, the covariances of the mixture Gaussians by using multiple stochastic transformations. We perform both speaker and dialect adaptation experiments, and we show that our method significantly improves the recognition accuracy and the robustness of our system. The experiments are carried out with SRI's DECIPHER speech recognition system.

[1]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Vassilios Digalakis,et al.  Spoken language translation with MID-90's technology: a case study , 1993, EUROSPEECH.

[3]  Jen-Tzung Chien,et al.  Improved Bayesian learning of hidden Markov models for speaker adaptation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Vassilios Diakoloukas,et al.  Development of dialect-specific speech recognizers using adaptation methods , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[6]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[7]  Leonardo Neumeyer,et al.  Probabilistic optimum filtering for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Biing-Hwang Juang,et al.  Signal bias removal by maximum likelihood estimation for robust telephone speech recognition , 1996, IEEE Trans. Speech Audio Process..

[9]  Horacio Franco,et al.  Acoustic adaptation using nonlinear transformations of HMM parameters , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Mitch Weintraub,et al.  The Hub and Spoke Paradigm for CSR Evaluation , 1994, HLT.

[11]  Chin-Hui Lee,et al.  A maximum-likelihood approach to stochastic matching for robust speech recognition , 1996, IEEE Trans. Speech Audio Process..

[12]  Yifan Gong,et al.  A unified maximum likelihood approach to acoustic mismatch compensation: application to noisy Lombard speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  R. Okafor Maximum likelihood estimation from incomplete data , 1987 .

[14]  Mitch Weintraub,et al.  Training issues and channel equalization techniques for the construction of telephone acoustic models using a high-quality speech corpus , 1994, IEEE Trans. Speech Audio Process..

[15]  Mark J. F. Gales Transformation smoothing for speaker and environmental adaptation , 1997, EUROSPEECH.

[16]  Jerome R. Bellegarda,et al.  Statistical techniques for robust ASR: review and perspectives , 1997, EUROSPEECH.

[17]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[18]  Vassilios Digalakis,et al.  Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers , 1996, IEEE Trans. Speech Audio Process..

[19]  Vassilios Digalakis,et al.  A comparative study of speaker adaptation techniques , 1995, EUROSPEECH.