ON THE USE OF FRAME-LEVEL INFORMATION CUES FOR MINIMUM PHONE ERROR TRAINING OF ACOUSTIC MODELS

This paper considers discriminative training of acoustic models for Mandarin large vocabulary continuous speech recognition. Two frame-level information cues were explored and integrated into the minimum phone error (MPE) training. First, the frame-level entropy of Gaussian posterior probabilities obtained from the word lattice of the training utterance was exploited to weight the framelevel statistics of the MPE training. The purpose of using entropy is to further emphasize or deemphasize the associated training statistics of plausibly correct and competing models for better discrimination. Second, we presented a new phone accuracy function based on the frame-level accuracy of hypothesized phone arcs instead of using the raw phone accuracy function of the MPE training. The underlying characteristics of the presented approaches were extensively investigated and their performance was verified by comparison with the original MPE training approach. Experiments conducted on the broadcast news collected in Taiwan showed that the presented approaches could achieve slight but consistent improvements over the baseline system.

[1]  William J. Byrne,et al.  Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition , 2006, Speech Commun..

[2]  Andreas Stolcke,et al.  Improved discriminative training using phone lattices , 2005, INTERSPEECH.

[3]  Hermann Ney,et al.  Explicit word error minimization using word hypothesis posterior probabilities , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  H. Ney,et al.  Minimum exact word error training , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[5]  Daniel Povey,et al.  Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Berlin Chen,et al.  Lightly supervised and data-driven approaches to Mandarin broadcast news transcription , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[8]  Hervé Bourlard,et al.  Spectral entropy feature in full-combination multi-stream for robust ASR , 2005, INTERSPEECH.

[9]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Spyridon Matsoukas,et al.  Minimum phoneme error based heteroscedastic linear discriminant analysis for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Zdravko Kacic,et al.  Overall risk criterion estimation of hidden Markov model parameters , 2002, Speech Commun..

[12]  A. Nadas,et al.  A generalization of the Baum algorithm to rational objective functions , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[13]  Mark J. F. Gales Maximum likelihood multiple subspace projections for hidden Markov models , 2002, IEEE Trans. Speech Audio Process..

[14]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[15]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Berlin Chen,et al.  Minimum word error based discriminative training of language models , 2005, INTERSPEECH.

[17]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[19]  Philip C. Woodland,et al.  Discriminative adaptive training using the MPE criterion , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).