论文信息 - Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition

Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition

While n-gram modeling is simple and dominant in speech recognition, it can only capture the short-distance context dependency within an n-word window where currently the largest practical n for natural language is three. However, many of the context dependencies in natural language occur beyond a three-word window. This paper proposes a new language modeling approach to capture the preferred relationships between words over a short or long distance through the concept of MI-Trigger pairs. Different MI-Trigger-based models are constructed in either a distance-dependent or a distance-independent way within a window from 1 to 10 words. This new MI-Trigger-based modeling is also compared and merged with word bigram modeling. It is found that the MI-Trigger-based modeling has better performance than word bigram modeling. It is also found that n-gram and MI-Trigger models have good complementarity and their proper merging can further increase the recognition rate when tested on Mandarin speech recognition. One advantage of MI-Trigger-based modeling is that the number of parameters needed for MI-Trigger modeling is much less than that of word bigram modeling. Another advantage is that the number of trigger pairs in an MI-Trigger model can be kept to a reasonable size without losing too much of its modeling power. c © 1999 Academic Press

Guodong Zhou | Kimteng Lua

[1] Ronald Rosenfeld,et al. Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[2] Kenneth Ward Church,et al. Poor Estimates of Context are Worse than None , 1990, HLT.

[3] Mei-Yuh Hwang,et al. The SPHINX-II speech recognition system: an overview , 1993, Comput. Speech Lang..

[4] Chiu-yu Tseng,et al. Golden Mandarin (III)-a user-adaptive prosodic-segment-based Mandarin dictation machine for Chinese language with very large vocabulary , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5] Mitchell P. Marcus,et al. Parsing a Natural Language Using Mutual Information Statistics , 1990, AAAI.

[6] Chiu-yu Tseng,et al. Golden Mandarin (II)-an improved single-chip real-time Mandarin dictation machine for Chinese language with very large vocabulary , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Takenobu Tokunaga,et al. Analysis of Japanese Compound Nouns using Collocational Information , 1994, COLING.

[8] Gareth J. F. Jones,et al. A consolidated language model for speech recognition , 1993, EUROSPEECH.

[9] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[10] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[11] Nicoletta Calzolari,et al. Acquisition of Lexical Information from a Large Textual Italian Corpus. , 1996 .

[12] Michael R. Brent,et al. From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax , 1993, Comput. Linguistics.

[13] Mats Rooth,et al. Structural Ambiguity and Lexical Relations , 1991, ACL.

[14] Claude E. Shannon,et al. Prediction and Entropy of Printed English , 1951 .

[15] Hy Murveit,et al. Integrating natural language constraints into HMM-based speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16] Kenneth Ward Church,et al. Enhanced Good-Turing and Cat-Cal: Two New Methods for Estimating Probabilities of English Bigrams (abbreviated version) , 1989, HLT.