Adaptive Learning in Acoustic and Language Modeling

ABSTRACf We present a mathematical framework for Bayesian adaptive leaming of the param­ eters of stochastic models. Maximum a posteriori (MAP) estimation algorithms are developed for hidden Markov models and for a number of useful models commonly used in automatic speech recognition and natural language processing. The MAP formulation offers a way to combine ex­ isting prior knowledge and a smaIl set of newly acquired task-specific data in an optimal manner. It is therefore ideal for adaptive learning applications such as speaker and task adaptation.

[1]  Chin-Hui Lee,et al.  A minimax classification approach with application to robust speech recognition , 1993, IEEE Trans. Speech Audio Process..

[2]  Chin-Hui Lee,et al.  Bayesian learning for hidden Markov model with Gaussian mixture state observation densities , 1991, Speech Commun..

[3]  Biing-Hwang Juang,et al.  A study on speaker adaptation of the parameters of continuous density hidden Markov models , 1991, IEEE Trans. Signal Process..

[4]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[5]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[6]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7]  Stephen Cox,et al.  Unsupervised speaker adaptation by probabilistic spectrum fitting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  Frederick Jelinek,et al.  The development of an experimental discrete dictation recognizer , 1985 .

[9]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[11]  Yunxin Zhao A new speaker adaptation technique using very short calibration speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[13]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[14]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[16]  Hsiao-Wuen Hon,et al.  Vocabulary-independent speech recognition: the Vocind System , 1992 .

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Francis Kubala,et al.  Hidden Markov Models and Speaker Adaptation , 1992 .

[19]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[20]  Louis A. Liporace,et al.  Maximum likelihood estimation for multivariate observations of Markov sources , 1982, IEEE Trans. Inf. Theory.

[21]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[22]  Aaron E. Rosenberg,et al.  Improved acoustic modeling for large vocabulary continuous speech recognition , 1992 .

[23]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Chin-Hui Lee,et al.  Bayesian learning of the parameters of discrete and tied mixture HMMs for speech recognition , 1993, EUROSPEECH.

[25]  Robert L. Mercer,et al.  Adaptive language modeling using minimum discriminant estimation , 1992 .

[26]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[27]  Jack Perkins,et al.  Pattern recognition in practice , 1980 .

[28]  Kai-Fu Lee,et al.  Automatic Speech Recognition , 1989 .

[29]  Shigeki Sagayama,et al.  Vector field smoothing principle for speaker adaptation , 1992, ICSLP.

[30]  Richard M. Stern,et al.  Environmental robustness in automatic speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[31]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[32]  Bernard Mérialdo,et al.  A Dynamic Language Model for Speech Recognition , 1991, HLT.

[33]  M. Degroot Optimal Statistical Decisions , 1970 .

[34]  S. Furui,et al.  Unsupervised speaker adaptation method based on hierarchical spectral clustering , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[35]  Chin-Hui Lee,et al.  A study of on-line Bayesian adaptation for HMM-based speech recognition , 1993, EUROSPEECH.

[36]  Pascale Fung,et al.  The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  H. Robbins The Empirical Bayes Approach to Statistical Decision Problems , 1964 .

[38]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[39]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.