Language Model Adaptation

This paper reviews methods for language model adaptation. Paradigms and basic methods are first introduced. Basic theory is presented for maximum a-posteriori estimation, mixture based adaptation, and minimum discrimination information. Models to cope with long distance dependencies are also introduced. Applications and results from the recent literature are finally surveyed.

[1]  J. Wolfowitz,et al.  Introduction to the Theory of Statistics. , 1951 .

[2]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[3]  J. Spragins,et al.  A note on the iterative application of Bayes' rule , 1965, IEEE Trans. Inf. Theory.

[4]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Clement T. Yu,et al.  Effective information retrieval using term accuracy , 1977, CACM.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[9]  Frederick Jelinek,et al.  Self-organizing language modeling for speech recognition , 1990 .

[10]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[11]  Roland Kuhn,et al.  Speech Recognition and the Frequency of Recently Used Words: A Modified Markov Model for Natural Language , 1988, COLING.

[12]  Julian Kupiec,et al.  Probabilistic Models of Short and Long Distance Word Dependencies in Running Text , 1989, HLT.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[15]  Renato De Mori,et al.  A Cache-Based Natural Language Model for Speech Recognition , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Bernard Mérialdo,et al.  A Dynamic Language Model for Speech Recognition , 1991, HLT.

[18]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[19]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[22]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[23]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[24]  Robert L. Mercer,et al.  Adaptive Language Modeling Using Minimum Discriminant Estimation , 1992, HLT.

[25]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[26]  E. Levin,et al.  Learning how to understand language , 1993, EUROSPEECH.

[27]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[28]  Reinhard Kneser,et al.  On the dynamic adaptation of stochastic language models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Ronald Rosenfeld,et al.  Trigger-based language models: a maximum entropy approach , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  David M. Carter,et al.  Improving Language Models by Clustering Training Sentences , 1994, ANLP.

[31]  Jonathan G. Fiscus,et al.  1993 Benchmark Tests for the ARPA Spoken Language Program , 1994, HLT.

[32]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[33]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[34]  Ralph Grishman,et al.  NYU Language Modeling Experiments for the 1995 CSR Evaluation , 1995 .

[35]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[36]  Stefan Besling,et al.  Language model speaker adaptation , 1995, EUROSPEECH.

[37]  Giuliano Antoniol,et al.  Language modelling for efficient beam-search , 1995, Comput. Speech Lang..

[38]  Salim Roukos,et al.  Language model adaptation via minimum discrimination information , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[39]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[40]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures vs. dynamic cache models , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[41]  Stanley F. Chen,et al.  Building Probabilistic Models for Natural Language , 1996, ArXiv.

[42]  Marcello Federico,et al.  Bayesian estimation methods for n-gram language model adaptation , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[43]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[44]  Dietrich Klakow,et al.  Language model adaptation using dynamic marginals , 1997, EUROSPEECH.

[45]  Tatsuya Kawahara,et al.  Task adaptation using MAP estimation in N-gram language modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[46]  Jochen Peters,et al.  Semantic clustering for adaptive language modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[47]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[48]  Fabio Brugnara,et al.  Dynamic language models for interactive speech applications , 1997, EUROSPEECH.

[49]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[50]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Paolo Baggia,et al.  Specialized language models using dialogue predictions , 1996, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[52]  Anthony J. Robinson,et al.  Language model adaptation using mixtures and an exponentially decaying cache , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[53]  Heinrich Niemann,et al.  Speedata: a prototype for multilingual spoken data-entry , 1997, EUROSPEECH.

[54]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[55]  K. Sparck Jones,et al.  A Probabilistic Model of Information Retrieval : Development and Status , 1998 .

[56]  Thomas Hofmann,et al.  Statistical Models for Co-occurrence Data , 1998 .

[57]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[58]  Ronald Rosenfeld,et al.  Nonlinear interpolation of topic models for language model adaptation , 1998, ICSLP.

[59]  Daniel Jurafsky,et al.  Towards better integration of semantic predictors in statistical language modeling , 1998, ICSLP.

[60]  Jerome R. Bellegarda Multi-Span statistical language modeling for large vocabulary speech recognition , 1998, ICSLP.

[61]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[62]  Steve Renals,et al.  Topic-based mixture language modelling , 1999, Nat. Lang. Eng..

[63]  Mauro Cettolo,et al.  History Integration into Semantic Classification , 1999 .

[64]  Taiyi Huang,et al.  An improved MAP method for language model adaptation , 1999, EUROSPEECH.

[65]  Marcello Federico,et al.  Efficient language model adaptation through MDI estimation , 1999, EUROSPEECH.