Scalable Multilingual Information Access

The third Cross-Language Evaluation Forum workshop (CLEF-2002) provides the unprecedented opportunity to evaluate retrieval in eight different languages using a common set of topics and a uniform assessment methodology. This year the Johns Hopkins University Applied Physics Laboratory participated in the monolingual, bilingual, and multilingual retrieval tasks. We contend that information access in a plethora of languages requires approaches that are inexpensive in developer and run-time costs. In this paper we describe a simplified approach that seems suitable for retrieval in many languages; we also show how good retrieval is possible over many languages, even when translation resources are scarce, or when query-time translation is infeasible. In particular, we investigate the use of character n-grams for monolingual retrieval, CLIR between related languages using partial morphological matches, and translation of document representations to an interlingua for computationally efficient retrieval against multiple languages.