Parsimonious language models for information retrieval

We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process: 1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  Djoerd Hiemstra,et al.  Disambiguation Strategies for Cross-Language Information Retrieval , 1999, ECDL.

[3]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[4]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[5]  Djoerd Hiemstra,et al.  Language Modelling and Relevance , 2003 .

[6]  W. Bruce Croft,et al.  Relevance Models in Information Retrieval , 2003 .

[7]  Andreas Stolcke,et al.  Improved modeling and efficiency for automatic transcription of Broadcast News , 2002, Speech Commun..

[8]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[9]  Jinxi Xu,et al.  Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[10]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[11]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[12]  Nuno Vasconcelos,et al.  Bayesian models for visual information retrieval , 2000 .

[13]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[14]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[15]  Kenney Ng A Maximum Likelihood Ratio Information Retrieval Model , 1999, TREC.

[16]  Thijs Westerveld,et al.  Multimedia Retrieval Using Multiple Examples , 2004, CIVR.

[17]  Yi Zhang,et al.  Exact Maximum Likelihood Estimation for Word Mixtures , 2002 .

[18]  Marcello Federico,et al.  Statistical cross-language information retrieval using n-best query translations , 2002, SIGIR '02.

[19]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[20]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[21]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[22]  Ellen M. Voorhees,et al.  Overview of TREC 2003 , 2003, TREC.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Jay Ponte,et al.  LANGUAGE MODELS FOR RELEVANCE FEEDBACK , 2002 .

[25]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[26]  Richard M. Schwartz,et al.  Topic tracking for radio, TV broadcast, and newswire , 1999, EUROSPEECH.

[27]  Stefano Mizzaro Relevance: the whole history , 1997 .