The third Cross-Language Evaluation Forum workshop (CLEF-2002) provides the unprecedented opportunity to evaluate retrieval in eight different languages using a common set of topics and a uniform assessment methodology. This year the Johns Hopkins University Applied Physics Laboratory participated in the monolingual, bilingual, and multilingual retrieval tasks. We contend that information access in a plethora of languages requires approaches that are inexpensive in developer and run-time costs. In this paper we describe a simplified approach that seems suitable for retrieval in many languages; we also show how good retrieval is possible over many languages, even when translation resources are scarce, or when query-time translation is infeasible. In particular, we investigate the use of character n-grams for monolingual retrieval, CLIR between related languages using partial morphological matches, and translation of document representations to an interlingua for computationally efficient retrieval against multiple languages.
[1]
James Mayfield,et al.
Comparing cross-language query expansion techniques by degrading translation resources
,
2002,
SIGIR '02.
[2]
Carol Peters,et al.
CLEF Methodology and Metrics
,
2001,
CLEF.
[3]
Ellen M. Voorhees,et al.
The Philosophy of Information Retrieval Evaluation
,
2001,
CLEF.
[4]
James Mayfield,et al.
JHU/APL Experiments at CLEF: Translation Resources and Score Normalization
,
2001,
CLEF.
[5]
Fredric C. Gey,et al.
Manual Queries and Machine Translation in Cross-Language Retrieval and Interactive Retrieval with Cheshire II at TREC-7
,
1998,
TREC.
[6]
Claire Cardie,et al.
Using clustering and SuperConcepts within SMART: TREC 6
,
1997,
Inf. Process. Manag..
[7]
Wessel Kraaij,et al.
TNO at CLEF-2001: Comparing Translation Resources
,
2001,
CLEF.
[8]
Carol Peters,et al.
CLEF 2002 Methodology and Metrics
,
2002,
CLEF.