论文信息 - REMAP-experiments with speech recognition

REMAP-experiments with speech recognition

We present experimental and theoretical results using a framework for training and modeling continuous speech recognition systems based on the theoretically optimal maximum a posteriori (MAP) criterion. This is in contrast to most state-of-the-art systems which are trained according to a maximum likelihood (ML) criterion. Although the algorithm is quite general, we applied it to a particular form of hybrid system combining hidden Markov models (HMMs) and artificial neural networks (ANNs) in which the ANN targets and weights are iteratively reestimated to guarantee the increase of the posterior probability of the correct model, hence actually minimizing the error rate. More specifically, this training approach is applied to a transition-based model that uses local conditional transition probabilities (i.e. the posterior probability of the current state given the current acoustic vector and the previous state) to estimate the posterior probabilities of sentences. Experimental results on isolated and continuous speech recognition tasks show an increase in the estimates of posterior probabilities of the correct sentences after training, and significant decreases in error rates in comparison to a baseline system.

[1] Yoshua Bengio,et al. Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[2] D. Lindley,et al. Bayes Estimates for the Linear Model , 1972 .

[3] Yochai Konig,et al. REMAP: recursive estimation and maximization of a posteriori probabilities in connectionist speech recognition , 1994, EUROSPEECH.

[4] R. Cole,et al. TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[5] James R. Glass. Finding acoustic regularities in speech: applications to phonetic recognition , 1988 .

[6] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.

[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9] Yochai Konig,et al. Remap: recursive estimation and maximization of a posteriori probabilities in transition-based speech recognition , 1996 .

[10] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[11] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12] Yochai Konig,et al. REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition , 1995, NIPS.