论文信息 - Scalable Multilingual Information Access

Scalable Multilingual Information Access

The third Cross-Language Evaluation Forum workshop (CLEF-2002) provides the unprecedented opportunity to evaluate retrieval in eight different languages using a common set of topics and a uniform assessment methodology. This year the Johns Hopkins University Applied Physics Laboratory participated in the monolingual, bilingual, and multilingual retrieval tasks. We contend that information access in a plethora of languages requires approaches that are inexpensive in developer and run-time costs. In this paper we describe a simplified approach that seems suitable for retrieval in many languages; we also show how good retrieval is possible over many languages, even when translation resources are scarce, or when query-time translation is infeasible. In particular, we investigate the use of character n-grams for monolingual retrieval, CLIR between related languages using partial morphological matches, and translation of document representations to an interlingua for computationally efficient retrieval against multiple languages.

James Mayfield | Paul McNamee

[1] James Mayfield,et al. Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[2] Carol Peters,et al. CLEF Methodology and Metrics , 2001, CLEF.

[3] Ellen M. Voorhees,et al. The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[4] James Mayfield,et al. JHU/APL Experiments at CLEF: Translation Resources and Score Normalization , 2001, CLEF.

[5] Fredric C. Gey,et al. Manual Queries and Machine Translation in Cross-Language Retrieval and Interactive Retrieval with Cheshire II at TREC-7 , 1998, TREC.

[6] Claire Cardie,et al. Using clustering and SuperConcepts within SMART: TREC 6 , 1997, Inf. Process. Manag..

[7] Wessel Kraaij,et al. TNO at CLEF-2001: Comparing Translation Resources , 2001, CLEF.

[8] Carol Peters,et al. CLEF 2002 Methodology and Metrics , 2002, CLEF.