University of Ottawa's Participation in the CL-SR Task at CLEF 2006

This paper presents the second participation of the University of Ottawa group in CLEF, the Cross- Language Spoken Retrieval (CL-SR) task. We present the results of the submitted runs for the English collec- tion and very briefly for the Czech collection, followed by many additional experiments. We have used two Information Retrieval systems in our experiments: SMART and Terrier were tested with many different weighting schemes for indexing the documents and the queries and with several query expansion techniques (including a new method based on log-likelihood scores for collocations). Our experiments showed that query expansion methods do not help much for this collection. We tested whether the new Automatic Speech Recognition transcripts improve the retrieval results; we also tested combinations of different automatic tran- scripts (with different estimated word error rates). The retrieval results did not improve, probably because the speech recognition errors happened for the words that are important in retrieval, even in the newer ASR2006 transcripts. By using different system settings, we improved on our submitted result for the required run (English queries, title and description) on automatic transcripts plus automatic keywords. We present cross- language experiments, where the queries are automatically translated by combining the results of several online machine translation tools. Our experiments showed that high quality automatic translations (for French) led to results comparable with monolingual English, while the performance decreased for the other languages. Experiments on indexing the manual summaries and keywords gave the best retrieval results.