论文信息 - University of Ottawa's Participation in the CL-SR Task at CLEF 2006

University of Ottawa's Participation in the CL-SR Task at CLEF 2006

This paper presents the second participation of the University of Ottawa group in CLEF, the Cross- Language Spoken Retrieval (CL-SR) task. We present the results of the submitted runs for the English collec- tion and very briefly for the Czech collection, followed by many additional experiments. We have used two Information Retrieval systems in our experiments: SMART and Terrier were tested with many different weighting schemes for indexing the documents and the queries and with several query expansion techniques (including a new method based on log-likelihood scores for collocations). Our experiments showed that query expansion methods do not help much for this collection. We tested whether the new Automatic Speech Recognition transcripts improve the retrieval results; we also tested combinations of different automatic tran- scripts (with different estimated word error rates). The retrieval results did not improve, probably because the speech recognition errors happened for the words that are important in retrieval, even in the newer ASR2006 transcripts. By using different system settings, we improved on our submitted result for the required run (English queries, title and description) on automatic transcripts plus automatic keywords. We present cross- language experiments, where the queries are automatically translated by combining the results of several online machine translation tools. Our experiments showed that high quality automatic translations (for French) led to results comparable with monolingual English, while the performance decreased for the other languages. Experiments on indexing the manual summaries and keywords gave the best retrieval results.

Diana Inkpen | Muath Alzghool

[1] Satanjeev Banerjee,et al. The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[2] C. J. van Rijsbergen,et al. Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[3] Gerard Salton,et al. Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4] Craig MacDonald,et al. Terrier Information Retrieval Platform , 2005, ECIR.

[5] Ryen W. White,et al. Overview of the CLEF-2005 Cross-Language Speech Retrieval Track , 2005, CLEF.

[6] Diana Inkpen,et al. Using Various Indexing Schemes and Multiple Translations in the CL-SR Task at CLEF 2005 , 2005, CLEF.

[7] Bhuvana Ramabhadran,et al. Building an information retrieval test collection for spontaneous conversational speech , 2004, SIGIR '04.

[8] James Allan,et al. Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[9] Gareth J. F. Jones,et al. Overview of the CLEF-2005 Cross-Language Speech Retrieval Track , 2005, CLEF.

[10] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[11] Claudio Carpineto,et al. An information-theoretic approach to automatic query expansion , 2001, TOIS.