Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models

An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a continuous density hidden Markov model (CDHMM) framework, Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering, and corrective training. The goal of this study is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the CDHMM training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and preliminary results applying to HMM parameter smoothing, speaker adaptation, and speaker clustering are given.Performance improvements were observed on tests using the DARPA RM task. For speaker adaptation, under a supervised learning mode with 2 minutes of speaker-specific training data, a 31% reduction in word error rate was obtained compared to speaker-independent results. Using Baysesian learning for HMM parameter smoothing and sex-dependent modeling, a 21% error reduction was observed on the FEB91 test.

[1]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[2]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[3]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[4]  Chin-Hui Lee,et al.  Bayesian adaptation in speech recognition , 1983, ICASSP.

[5]  Fritz Class,et al.  A learning procedure for speaker-dependent word recognition systems based on sequential processing of input tokens , 1983, ICASSP.

[6]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[7]  Francis Kubala,et al.  Improved Speaker Adaptation Using Text Dependent Spectral Mappin , 1988 .

[8]  M. Degroot Optimal Statistical Decisions , 1970 .

[9]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[10]  Chin-Hui Lee,et al.  A study on speaker adaptation of continuous density HMM parameters , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[11]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  Richard M. Stern,et al.  Dynamic speaker adaptation for feature-based isolated word recognition , 1987, IEEE Trans. Acoust. Speech Signal Process..

[13]  Richard M. Schwartz,et al.  A New Paradigm for Speaker-Independent Training and Speaker Adaptation , 1990, HLT.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Marco Ferretti,et al.  Large-vocabulary speech recognition with speaker-adapted codebook and HMM parameters , 1989, EUROSPEECH.

[16]  Aaron E. Rosenberg,et al.  Improved Acoustic Modeling for Continuous Speech Recognition , 1990, HLT.

[17]  李幼升,et al.  Ph , 1989 .

[18]  Richard M. Schwartz,et al.  Improved speaker adaption using text dependent spectral mappings , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[19]  Chin-Hui Lee,et al.  Implementation Aspects Of Large Vocabulary Recognition Based On Intraword And Interword Phonetic Units , 1990, HLT.

[20]  Mei-Yuh Hwang,et al.  Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition , 1990, HLT.