Comparing Weighting Models for Monolingual Information Retrieval

Motivated by the hypothesis that the retrieval performance of a weighting model is independent of the language in which queries and collection are expressed, we compared the retrieval performance of three weighting models, i.e., Okapi, statistical language modeling (SLM), and deviation from randomness (DFR), on three monolingual test collections, i.e., French, Italian, and Spanish. The DFR model was found to consistently achieve better results than both Okapi and SLM, whose performance was comparable. We also evaluated whether the use of retrieval feedback improved retrieval performance; retrieval feedback was beneficial for DFR and Okapi and detrimental for SLM. Besides relative performance, DFR with retrieval feedback achieved excellent absolute results: best run for Italian and Spanish, third run for French.

[1]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[2]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Karen Sparck Jones,et al.  Okapi at TREC{7: automatic ad hoc, ltering, VLC and interactive track , 1999 .

[5]  Jacques Savoy Report on CLEF-2001 Experiments: Effective Combined Query-Translation Approach , 2001, CLEF.

[6]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[7]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[8]  Claudio Carpineto,et al.  FUB at TREC-10 Web Track: A Probabilistic Framework for Topic Relevance Term Weighting , 2001, TREC.

[9]  Claudio Carpineto,et al.  Italian Monolingual Information Retrieval with PROSIT , 2002, CLEF.

[10]  Claudio Carpineto,et al.  Fondazione Ugo Bordoni at TREC 2003: Robust and Web Track , 2003, TREC.

[11]  Julio Gonzalo,et al.  Advances in Cross-Language Information Retrieval , 2002, Lecture Notes in Computer Science.

[12]  Djoerd Hiemstra,et al.  Twenty-One at TREC7: Ad-hoc and Cross-Language Track , 1998, TREC.

[13]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[14]  Carol Peters,et al.  Evaluation of Cross-Language Information Retrieval Systems , 2002, Lecture Notes in Computer Science.