A Bayesian prediction approach to robust speech recognition and online environmental learning

Abstract A robust speech recognizer is developed to tackle the inevitable mismatch between training and testing environments. Because the realistic environments are uncertain and nonstationary , it is necessary to characterize the uncertainty of speech hidden Markov models (HMMs) for recognition and trace the uncertainty incrementally to catch the newest environmental statistics. In this paper, we develop a new Bayesian predictive classification (BPC) for robust decision and online environmental learning . The BPC decision is adequately established by modeling the uncertainties of both the HMM mean vector and precision matrix using a conjugate prior density. The frame-based predictive distributions using multivariate t distributions and approximate Gaussian distributions are herein exploited. After the recognition, the prior density is pooled with the likelihood of the current test sentence to generate the reproducible prior density. The hyperparameters of the prior density are accordingly adjusted to meet the newest environments and apply for the recognition of upcoming data. As a result, an efficient online unsupervised learning strategy is developed for HMM-based speech recognition without needing adaptation data. In the experiments, the proposed approach is significantly better than conventional plug-in maximum a posteriori (MAP) decision on the recognition of connected Chinese digits in hands-free car environments. This approach is economical in computation.

[1]  Chin-Hui Lee,et al.  On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate , 1997, IEEE Trans. Speech Audio Process..

[2]  Chin-Hui Lee,et al.  Predictive adaptation and compensation for robust speech recognition , 1998, ICSLP.

[3]  Chin-Hui Lee,et al.  A Bayesian predictive classification approach to robust speech recognition , 2000, IEEE Trans. Speech Audio Process..

[4]  Chin-Hui Lee,et al.  Combined on-line model adaptation and Bayesian predictive classification for robust speech recognition , 1997, EUROSPEECH.

[5]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[6]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[7]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[8]  Neri Merhav,et al.  A Bayesian classification approach with application to speech recognition , 1991, IEEE Trans. Signal Process..

[9]  Jen-Tzung Chien,et al.  Online hierarchical transformation of hidden Markov models for speech recognition , 1999, IEEE Trans. Speech Audio Process..

[10]  Vassilios Digalakis,et al.  Online adaptation of hidden Markov models using incremental estimation algorithms , 1997, IEEE Trans. Speech Audio Process..

[11]  Arthur Nadas,et al.  Optimal solution of a training problem in speech recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[12]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[13]  Keikichi Hirose,et al.  Robust speech recognition based on Viterbi Bayesian predictive classification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[16]  Jen-Tzung Chien,et al.  Unsupervised hierarchical adaptation using reliable selection of cluster-dependent parameters , 2000, Speech Commun..

[17]  Keikichi Hirose,et al.  Improving Viterbi Bayesian predictive classification via sequential bayesian learning in robust speech recognition , 1999, Speech Commun..

[18]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[19]  Qiang HUO,et al.  Adaptive Learning and Compensation of Hidden Markov Model For Robust Speech Recognition , 1998 .

[20]  James O. Berger,et al.  Statistical Decision Theory and Bayesian Analysis, Second Edition , 1985 .

[21]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[22]  武田 一哉,et al.  Workshop on Robust Methods for Speech Recognition in Adverse Conditions報告 , 1999 .

[23]  Chin-Hui Lee,et al.  A minimax classification approach with application to robust speech recognition , 1993, IEEE Trans. Speech Audio Process..

[24]  Chin-Hui Lee,et al.  On stochastic feature and model compensation approaches to robust speech recognition , 1998, Speech Commun..

[25]  Sadaoki Furui,et al.  N-Best-based unsupervised speaker adaptation for speech recognition , 1998, Comput. Speech Lang..

[26]  Ben Shahshahani A Markov random field approach to Bayesian speaker adaptation , 1997, IEEE Trans. Speech Audio Process..

[27]  M. Degroot Optimal Statistical Decisions , 1970 .

[28]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[29]  Harvey F. Silverman,et al.  Efficient training algorithms for HMMs using incremental estimation , 1998, IEEE Trans. Speech Audio Process..

[30]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .