Cross-Language Multi-Media Information Retrieval

We describe here the evaluation of a cross-language information retrieval technique based on similarity thesauri in a multi-media document environment, such as is likely to be found in a digital library. We present the theory of similarity thesauri, which are information structures derived from corpora, and show how they can be used for cross-language retrieval. In evaluating our similarity thesaurus-based approach to cross-language retrieval over a parallel collection of legal texts, we show that cross-language retrieval can perform equally as well as monolingual retrieval in the certain cases. We also present the results of a rst evaluation of cross-language retrieval with spoken news material. We conclude that providing cross-language access to multi-media digital libraries is already a viable possibility.

[1]  Mark W. Davis,et al.  A TREC Evaluation of Query Translation Methods For Multi-Lingual Text Retrieval , 1995, TREC.

[2]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[3]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[4]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[5]  Douglas W. Oard Speech-Based Information Retrieval for Digital Libraries , 1998 .

[6]  Peter Schäuble,et al.  Cross-language speech retrieval: establishing a baseline performance , 1997, SIGIR '97.

[7]  Yonggang Qiu Automatic query expansion based on a similarity thesaurus , 1995 .

[8]  Peter Schäuble,et al.  Speech Retrieval Based on Automatic Indexing , 1995, MIRO.

[9]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[10]  Martin Braschler,et al.  Cross-Language Information Retrieval in a Multilingual Legal Domain , 1997, ECDL.

[11]  Peter Schäuble,et al.  Applying probabilistic term weighting to OCR text in the case of a large alphabetic library catalogue , 1995, SIGIR '95.

[12]  Peter Schäuble,et al.  Metadata for Content-based Retrieval of Speech Recording , 1998, Multimedia Data Management.

[13]  Peter Schäuble,et al.  Building a Large Multilingual Test Collection from Comparable News Documents , 1998 .

[14]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.