Using Translation Heuristics to Improve a Multimodal and Multilingual Information Retrieval System

Nowadays, the multimodal nature of the World Wide Web is an evidence. Web sites which include video files, pictures, music and text have become widespread. Furthermore, multimodal collections in several languages demand to apply multilingual information retrieval strategies. This paper describes a new retrieval technique applied on a multimodal and multilingual system that have been tested on two different multilingual image collections. The system applies several machine translators and implements some novel heuristics. These heuristics explore a variety of ways to combine the translations obtained from the given set of translators, and the configuration of the retrieval model by using different weighting functions, and also studying the effect of pseudo-relevance feedback(PRF) on this domain. Our results show interesting effects by these variations, allowing the determination of the parameters for the best retrieval model on this data and reporting the loss in performance on each language.

[1]  Manuel Montes-y-Gómez,et al.  Enhancing Cross-Language Question Answering by Combining Multiple Question Translations , 2009, CICLing.

[2]  Thomas Martin Deserno,et al.  The CLEF 2005 Cross-Language Image Retrieval Track , 2005, CLEF.

[3]  Miguel Ángel García Cumbreras,et al.  SINAI at ImageCLEF 2005 , 2005, CLEF.

[4]  Antoine Geissbühler,et al.  ImageCLEF 2004: Combining Image and Multi-lingual Search for Medical Image Retrieval , 2004, CLEF.

[5]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[6]  Carol Peters,et al.  Multilingual Information Access for Text, Speech and Images, 5th Workshop of the Cross-Language Evaluation Forum, CLEF 2004, Bath, UK, September 15-17, 2004, Revised Selected Papers , 2005, CLEF.

[7]  Miguel Ángel García Cumbreras,et al.  The University of Jaén at ImageCLEF 2005: Adhoc and Medical Tasks , 2005, CLEF.

[8]  Fredric C. Gey,et al.  Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers , 2006, CLEF.

[9]  Miguel Ángel García Cumbreras,et al.  Using Information Gain to Improve the ImageCLEF 2006 Collection , 2006, CLEF.

[10]  Mark Sanderson,et al.  Improving cross language retrieval with triangulated translation , 2001, SIGIR '01.

[11]  Dominique Laurent,et al.  Cross Lingual Question Answering using QRISTAL for CLEF 2008 , 2006, CLEF.

[12]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[13]  Chris Callison-Burch,et al.  A program for automatically selecting the best output from multiple machine translation engines , 2001, MTSUMMIT.

[14]  Miguel Ángel García Cumbreras,et al.  SINAI at ImageCLEF 2006 , 2006, CLEF.