UniNE at Domain-Specific IR - CLEF 2008: Scientific Data Retrieval: Various Query Expansion Approaches

Our first objective in participating in this domain -specific evaluation campaign is to propose and evaluate various indexing and search strategies for the German, English and Russian languages, in an effort to obtain better retrieval effectiveness tha n that of the language-independent approach (n-gram). To do so we evaluate the GIRT-4 test-colle ction using the Okapi, various IR models derived from the Divergence from Randomness (DFR) paradigm, the statistical language model (LM) together with the classical tf .idf vector-processing scheme.

[1]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[2]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[3]  Djoerd Hiemstra,et al.  Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term , 2002, SIGIR '02.

[4]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[5]  Carol Peters,et al.  Multilingual Information Access for Text, Speech and Images, 5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15-17, 2004, Revised Selected Papers , 2005, CLEF.

[6]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[7]  Michael Kluck The GIRT Data in the Evaluation of CLIR Systems - from 1997 Until 2003 , 2003, CLEF.

[8]  Jacques Savoy,et al.  Selection and Merging Strategies for Multilingual Information Retrieval , 2004, CLEF.

[9]  Carol Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems , 2003, Lecture Notes in Computer Science.

[10]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[11]  Jacques Savoy,et al.  Searching in Medline: Query expansion and manual indexing evaluation , 2008, Inf. Process. Manag..

[12]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[13]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[14]  Claire Fautsch,et al.  Domain-Specific IR for German, English and Russian Languages , 2007, CLEF.

[15]  Jacques Savoy Report on CLEF-2003 Monolingual Tracks: Fusion of Probabilistic Models for Effective Monolingual Retrieval , 2003, CLEF.