Experiences in evaluating multilingual and text‐image information retrieval

One important step during the development of information retrieval (IR) processes is the evaluation of the output regarding the information needs of the user. The “high quality” of the output is related to the integration of different methods to be applied in the IR process and the information included in the retrieved documents, but how can “quality” be measured? Although some of these methods can be tested in a stand‐alone way, it is not always clear what will happen when several methods are integrated. For this reason, much effort has been put into establishing a good combination of several methods or to correctly tuning some of the algorithms involved. The current approach is to measure the precision and recall figures yielded when different combinations of methods are included in an IR process. In this article, a short description of the current techniques and methods included in an IR system is given, paying special attention to the multilingual aspect of the problem. Also a discussion of their influence on the final performance of the IR process is presented by explaining previous experiences in the evaluation process followed in two projects (MIRACLE and OmniPaper) related to multilingual information retrieval. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 655–677, 2006.

[1]  P. Vossen,et al.  The EuroWordNet Base Concepts and Top Ontology , 1998 .

[2]  José Luis Martínez-Fernández,et al.  MIRACLE's Hybrid Approach to Bilingual and Monolingual Information Retrieval , 2004, CLEF.

[3]  Turid Hedlund,et al.  Multilingual Experiments of UTA at CLEF 2003: The Impact of Different Merging Strategies and Word Normalizing Tools , 2003, CLEF.

[4]  Fernando Martínez Santiago El problema de la fusión de colecciones en la recuperación de información multilingüe y distribuida: cálculo de la relevancia de documental en dos pasos , 2005, Proces. del Leng. Natural.

[5]  Thomas Martin Deserno,et al.  The CLEF 2005 Cross-Language Image Retrieval Track , 2003, CLEF.

[6]  José Luis Martínez-Fernández,et al.  MIRACLE at ImageCLEF 2004 , 2004, CLEF.

[7]  Fabio Crestani,et al.  Information Retrieval: Uncertainty and Logics , 1998, The Kluwer International Series on Information Retrieval.

[8]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  Julio Gonzalo,et al.  Búsqueda de informacion multilingue: estado del arte , 2004, Inteligencia Artif..

[11]  J Allan,et al.  Readings in information retrieval. , 1998 .

[12]  Michael W. Berry,et al.  Survey of Text Mining: Clustering, Classification, and Retrieval , 2007 .

[13]  James Mayfield,et al.  JHU/APL Experiments in Tokenization and Non-Word Translation , 2003, CLEF.

[14]  Jean-Michel Renders,et al.  Report on CLEF-2003 Experiments: Two Ways of Extracting Multilingual Resources from Corpora , 2003, CLEF.

[15]  Ana M. García-Serrano,et al.  An Interface Agent with Linguistic Skills , 2001, NLDB.

[16]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[17]  José Carlos González,et al.  ARIES: A lexical platform for engineering Spanish processing tools , 1997, Natural Language Engineering.

[18]  C. Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems: 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Trondheim, Norway, August ... Papers (Lecture Notes in Computer Science) , 2005 .

[19]  Carol Peters,et al.  Comparative Evaluation of Multilingual Information Access Systems , 2003, Lecture Notes in Computer Science.

[20]  José Luis Martínez-Fernández,et al.  Image Retrieval: The MIRACLE Approach , 2003, CLEF.

[21]  Michael W. Berry,et al.  Survey of Text Mining , 2003, Springer New York.

[22]  José Luis Martínez-Fernández,et al.  MIRACLE Approaches to Multilingual Information Retrieval: A Baseline for Future Research , 2003, CLEF.

[23]  Paul Clough,et al.  A proposal for the CLEF Cross-Language Image Retrieval Track 2004 , 2004 .