The effectiveness of combining information retrieval strategies for European languages

Building an effective Information Retrieval system requires various design choices, ranging from the weighting scheme of the type of morphological normalization. The combination of runs has become a standard technique to reap the benefits of different run types. Until now, systematic studies of the effectiveness of combination strategies have only been carried out for English. This paper provides an exploratory overview of the effectiveness of combination methods in nine European languages. We demonstrate that the combination of effective information retrieval strategies can lead to significant improvements of retrieval effectiveness. Furthermore, we analyze the relative impact of retrieving more relevant documents and of improved ranking of relevant documents. The experimental evidence is obtained using the 2003 testsuite of the cross-language evaluation forum (CLEF).

[1]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[2]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[3]  Jacques Savoy,et al.  Combining Multiple Strategies for Effective Monolingual and Cross-Language Retrieval , 2004, Information Retrieval.

[4]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[5]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[6]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[7]  M. de Rijke,et al.  Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.

[8]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[9]  Paul B. Kantor,et al.  A study of information seeking and retrieving. III. Searchers, searches, and overlap , 1988, J. Am. Soc. Inf. Sci..

[10]  Paul B. Kantor,et al.  A study of information seeking and retrieving. I. background and methodology , 1988 .

[11]  Maarten de Rijke,et al.  Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian , 2001, CLEF.

[12]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[13]  Javed A. Aslam,et al.  Bayes optimal metasearch: a probabilistic model for combining the results of multiple retrieval systems (poster session) , 2000, SIGIR '00.

[14]  Ophir Frieder,et al.  Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies , 2003, SAC '03.

[15]  T. de Heer The application of the concept of homeosemy to natural language information retrieval , 1982, Inf. Process. Manag..

[16]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[17]  Ophir Frieder,et al.  Analyses of multiple-evidence combinations for retrieval strategies , 2001, SIGIR '01.

[18]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[19]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[20]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[21]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[22]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.