First Participation of University and Hospitals of Geneva to Domain-Specific Track in CLEF 2008

We participate in 2008 to our first Domain-Specific Track, with the aim to establish a baseline for our Information Retrieval engine in an unknown domain for us. We are specialized in Natural Language Processing in the biomedical domain, and we participate to the medical Image track and to TREC Genomics for four years with textual strategies, as queries expansions with controlled vocabularies, pattern recognition and vectorial space models. The technical component of our crosslanguage search engine is a generic toolkit, EasyIR, with which we can perform Text Categorization and Information Retrieval. The strategy applied for the 2008 Domain-Specific track is as simple as possible, as we want only to establish a baseline for EasyIR in a new track. For the English monolingual task, we choose to work with the title, the descriptive text and some types of classification terms to index documents. For the German queries to English collection bilingual task, we choose to perform a simple retrieval on the German collection in one hand, and to collect the descriptors of the retrieved documents in order to make cross-lingual query expansion in the other hand. Unfortunately, our results cannot be seen as fair, as we achieve MAP of 0.171 for the monolingual task and MAP of 0.132 for the bilingual task. Nevertheless, comparing to several baseline runs of other participants for DS CLEF 2007, our baseline run achieves equal performances. Possibilities to improve for the next DS CLEF are best tuning of our system with the benchmark, and an efficient use of the controlled vocabularies.

[1]  Patrick Ruch,et al.  University and Hospitals of Geneva at ImageCLEF 2007 , 2007, CLEF.

[2]  Eugene Kim,et al.  Overview of the ImageCLEFmed 2006 Medical Retrieval and Annotation Tasks , 2006, CLEF.

[3]  Jean-Michel Renders,et al.  XRCE's Participation to CLEF 2007 Domain-Specific Track , 2007, CLEF.

[4]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[5]  Vivien Petras,et al.  The Domain -Specific Track at CLEF 2007 , 2007, CLEF.

[6]  Maximilian Eibl,et al.  Domain-Specific Cross Language Retrieval: Comparing and Merging Structured and Unstructured Indices , 2007, CLEF.

[7]  Jörg Tiedemann,et al.  CLEF 2007 Working Notes , 2007 .

[8]  Patrick Ruch,et al.  Automatic assignment of biomedical categories: toward a generic approach , 2006, Bioinform..

[9]  Marti A. Hearst,et al.  TREC 2007 Genomics Track Overview , 2007, TREC.

[10]  Udi Manber,et al.  GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.

[11]  R. Adams Proceedings , 1947 .

[12]  Claire Fautsch,et al.  Domain-Specific IR for German, English and Russian Languages , 2007, CLEF.

[13]  Patrick Ruch,et al.  Query and Document Translation by Automatic Text Categorization: A Simple Approach to Establish a Strong Textual Baseline for ImageCLEFmed 2006 , 2006, CLEF.

[14]  Patrick Ruch,et al.  Vocabulary-Driven Passage Retrieval for Question-Answering in Genomics , 2007, TREC.

[15]  Robert H. Baud,et al.  Learning-Free Text Categorization , 2003, AIME.