Language Models

A language model assigns a probability to a piece of unseen text, based on some training data. For example, a language model based on a big English newspaper archive is expected to assign a higher probability to “a bit of text” than to “aw pit tov tags”, because the words in the former phrase (or word pairs or word triples if so-called N -GRAM MODELS are used) occur more frequently in the data than the words in the latter phrase. For information retrieval, typical usage is to build a language model for each document. At search time, the top ranked document is the one which’ language model assigns the highest probability to the query.

[1]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[2]  Richard M. Schwartz,et al.  Topic tracking for radio, TV broadcast, and newswire , 1999, EUROSPEECH.

[3]  Ralph Weischedel,et al.  A Probabilistic Approach to Term Translation for Cross-Lingual Retrieval , 2003 .

[4]  Valeriy Naumov,et al.  The life and work of A.A. Markov , 2004 .

[5]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[6]  R. Schwartz,et al.  Unsupervised Topic Discovery , 2001 .

[7]  Djoerd Hiemstra,et al.  Disambiguation Strategies for Cross-Language Information Retrieval , 1999, ECDL.

[8]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[9]  W. Bruce Croft,et al.  Relevance Models in Information Retrieval , 2003 .

[10]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[11]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[12]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[13]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[14]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[15]  Wessel Kraaij,et al.  Language Models for Topic Tracking , 2003 .

[16]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[17]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .