Domain-Specific CLIR of English, German and Russian Using Fusion and Subject Metadata for Query Expansion

This paper describes the combined submissions of the Berkeley group for the domain-specific track at CLEF 2005. The data fusion technique being tested is the fusion of multiple probabilistic searches against different XML components using both Logistic Regression (LR) algorithms and a version of the Okapi BM-25 algorithm. We also combine multiple translations of queries in cross-language searching. The second technique analyzed is query enhancement with domain-specific metadata (thesaurus terms). We describe our technique of Entry Vocabulary Modules, which associates query words with thesaurus terms and suggest its use for monolingual as well as bilingual retrieval. Different weighting and merging schemes for adding keywords to queries as well as translation techniques are described.

[1]  Fredric C. Gey,et al.  Multilingual Information Retrieval Using Machine Translation, Relevance Feedback and Decompounding , 2004, Information Retrieval.

[2]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[3]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[4]  Fredric C. Gey,et al.  UC Berkeley at CLEF 2003 - Russian Language Experiments and Domain-Specific Cross-Language Retrieval , 2003, CLEF.

[5]  Fredric C. Gey,et al.  UC Berkeley at CLEF-2003 - Russian Language Experiments and Domain-Specific Retrieval , 2003, CLEF.

[6]  Carol Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems , 2003, Lecture Notes in Computer Science.

[7]  Micheline Hancock-Beaulieu,et al.  Interactive thesaurus navigation: intelligence rules ok? , 1995 .

[8]  Fredric C. Gey,et al.  Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers , 2006, CLEF.

[9]  Fredric C. Gey,et al.  Experiments in the Probabilistic Retrieval of Full Text Documents , 1994, TREC.

[10]  Susan Gauch,et al.  An expert system for automatic query reformation , 1993 .

[11]  Vivien Petras How One Word Can Make all the Difference - Using Subject Metadata for Automatic Query Expansion and Reformulation , 2005, CLEF.

[12]  Fredric C. Gey,et al.  Full Text Retrieval based on Probalistic Equations with Coefficients fitted by Logistic Regression , 1993, TREC.

[13]  Fredric C. Gey,et al.  Berkeley at GeoCLEF: Logistic Regression and Fusion for Geographic Information Retrieval , 2005, CLEF.

[14]  Pertti Vakkari,et al.  Subject knowledge improves interactive query expansion assisted by a thesaurus , 2004, J. Documentation.

[15]  Ray R. Larson,et al.  A Fusion Approach to XML Structured Document Retrieval , 2005, Information Retrieval.

[16]  Barbara A. Norgard,et al.  An association-based method for automatic indexing with a controlled vocabulary , 1998 .

[17]  Michael Kluck The GIRT Data in the Evaluation of CLIR Systems - from 1997 Until 2003 , 2003, CLEF.

[18]  Stephen E. Robertson,et al.  On relevance weights with little relevance information , 1997, SIGIR '97.

[19]  Crawford Revie,et al.  Thesaurus-enhanced search interfaces , 2002, J. Inf. Sci..