论文信息 - Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Standard statistical language models use n-grams to capture local dependencies, or use dynamic modeling techniques to track dependencies within an article. In this paper, we investigate a new statistical language model that captures topic-related dependencies of words within and across sentences. First, we develop a topic-dependent, sentence-level mixture language model which takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic adaptation techniques in the framework of the mixture model, using n-gram caches and content word unigram caches. Experiments with the static (or unadapted) mixture model on the North American Business (NAB) task show a 21% reduction in perplexity and a 3-4% improvement in recognition accuracy over a general n-gram model, giving a larger gain than that obtained with supervised dynamic cache modeling. Further experiments on the Switchboard corpus also showed a small improvement in performance with the sentence-level mixture model. Cache modeling techniques introduced in the mixture framework contributed a further 14% reduction in perplexity and a small improvement in recognition accuracy on the NAB task for both supervised and unsupervised adaptation.

Mari Ostendorf | Rukmini Iyer

[1] Mari Ostendorf,et al. Transforming out-of-domain estimates to improve in-domain language models , 1997, EUROSPEECH.

[2] Mari Ostendorf,et al. Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[3] Ronald Rosenfeld,et al. The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[4] Alexander I. Rudnicky,et al. Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[5] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6] Reinhard Kneser,et al. On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Bernard Mérialdo,et al. A Dynamic Language Model for Speech Recognition , 1991, HLT.

[8] Renato De Mori,et al. A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Richard O. Duda,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10] Gareth Jones,et al. Hybrid grammar-bigram speech recognition system with first-order dependence model , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Pascale Fung,et al. The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12] Mari Ostendorf,et al. Language Modeling with Sentence-Level Mixtures , 1994, HLT.

[13] Satoshi Sekine,et al. Automatic Sublanguage Identification for a New Text , 1994 .

[14] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16] Ronald Rosenfeld,et al. Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17] Robert M. Gray,et al. An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[18] Jan Robin Rohlicek,et al. Statistical language modeling combining N-gram and context-free grammars , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19] Mari Ostendorf,et al. Weight Estimation for N-Best Rescoring , 1992, HLT.

[20] Ronald Rosenfeld. A Hybrid Approach to Adaptive Statistical Language Modeling , 1994, HLT.

[21] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[22] Frederick Jelinek,et al. Basic Methods of Probabilistic Context Free Grammars , 1992 .

[23] M. Ostendorf,et al. Using out-of-domain data to improve in-domain language models , 1997, IEEE Signal Processing Letters.

[24] Frederick Jelinek,et al. Self-organizing language modeling for speech recognition , 1990 .

[25] Lalit R. Bahl,et al. A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[26] Jonathan G. Fiscus,et al. 1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[27] Bristol Bss,et al. HYBRID GRAMMAR-BIGRAM SPEECH RECOGNITION SYSTEM WITH FIRST-ORDER DEPENDENCE MODEL , 1992 .