Modeling long distance dependence in language: topic mixtures versus dynamic cache models

Standard statistical language models use n-grams to capture local dependencies, or use dynamic modeling techniques to track dependencies within an article. In this paper, we investigate a new statistical language model that captures topic-related dependencies of words within and across sentences. First, we develop a topic-dependent, sentence-level mixture language model which takes advantage of the topic constraints in a sentence or article. Second, we introduce topic-dependent dynamic adaptation techniques in the framework of the mixture model, using n-gram caches and content word unigram caches. Experiments with the static (or unadapted) mixture model on the North American Business (NAB) task show a 21% reduction in perplexity and a 3-4% improvement in recognition accuracy over a general n-gram model, giving a larger gain than that obtained with supervised dynamic cache modeling. Further experiments on the Switchboard corpus also showed a small improvement in performance with the sentence-level mixture model. Cache modeling techniques introduced in the mixture framework contributed a further 14% reduction in perplexity and a small improvement in recognition accuracy on the NAB task for both supervised and unsupervised adaptation.

[1]  Mari Ostendorf,et al.  Transforming out-of-domain estimates to improve in-domain language models , 1997, EUROSPEECH.

[2]  Mari Ostendorf,et al.  Integration of Diverse Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses , 1991, HLT.

[3]  Ronald Rosenfeld,et al.  The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation , 1995 .

[4]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Reinhard Kneser,et al.  On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Bernard Mérialdo,et al.  A Dynamic Language Model for Speech Recognition , 1991, HLT.

[8]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Gareth Jones,et al.  Hybrid grammar-bigram speech recognition system with first-order dependence model , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Pascale Fung,et al.  The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Mari Ostendorf,et al.  Language Modeling with Sentence-Level Mixtures , 1994, HLT.

[13]  Satoshi Sekine,et al.  Automatic Sublanguage Identification for a New Text , 1994 .

[14]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[15]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[18]  Jan Robin Rohlicek,et al.  Statistical language modeling combining N-gram and context-free grammars , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Mari Ostendorf,et al.  Weight Estimation for N-Best Rescoring , 1992, HLT.

[20]  Ronald Rosenfeld A Hybrid Approach to Adaptive Statistical Language Modeling , 1994, HLT.

[21]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[22]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[23]  M. Ostendorf,et al.  Using out-of-domain data to improve in-domain language models , 1997, IEEE Signal Processing Letters.

[24]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[25]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[26]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[27]  Bristol Bss,et al.  HYBRID GRAMMAR-BIGRAM SPEECH RECOGNITION SYSTEM WITH FIRST-ORDER DEPENDENCE MODEL , 1992 .