Experiments for the Cross Language Speech Retrieval Task at CLEF 2006

This paper presents the second participation of the University of Ottawa group in the Cross-Language Speech Retrieval (CL-SR) task at CLEF 2006. We present the results of the submitted runs for the English collection and very briefly for the Czech collection, followed by many additional experiments. We have used two Information Retrieval systems in our experiments: SMART and Terrier, with several query expansion techniques (including a new method based on log-likelihood scores for collocations). Our experiments showed that query expansion methods do not help much for this collection. We tested different Automatic Speech Recognition transcripts and combinations. The retrieval results did not improve, probably because the speech recognition errors happened for the words that are important in retrieval. We present cross-language experiments, where the queries are automatically translated by combining the results of several online machine translation tools. Our experiments showed that high quality automatic translations (for French) led to results comparable with monolingual English, while the performance decreased for the other languages. Experiments on indexing the manual summaries and keywords gave the best retrieval results.

[1]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[2]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[3]  Ludek Müller,et al.  The University of West Bohemia at CLEF 2006, the CL-SR Track , 2006, CLEF.

[4]  Diana Inkpen,et al.  Using Various Indexing Schemes and Multiple Translations in the CL-SR Task at CLEF 2005 , 2005, CLEF.

[5]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Ying Zhang,et al.  Dublin City University at CLEF 2007: Cross-Language Speech Retrieval Experiments , 2007, CLEF.

[8]  Ke Zhang,et al.  Dublin City University at CLEF 2006: Cross-Language Speech Retrieval (CL-SR) Experiments , 2006, CLEF.

[9]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[10]  Ryen W. White,et al.  Overview of the CLEF-2006 Cross-Language Speech Retrieval Track , 2006, CLEF.

[11]  Fredric C. Gey,et al.  Accessing Multilingual Information Repositories, 6th Workshop of the Cross-Language Evalution Forum, CLEF 2005, Vienna, Austria, 21-23 September, 2005, Revised Selected Papers , 2006, CLEF.

[12]  Bhuvana Ramabhadran,et al.  Building an information retrieval test collection for spontaneous conversational speech , 2004, SIGIR '04.

[13]  James Allan,et al.  Automatic Retrieval With Locality Information Using SMART , 1992, TREC.