Structured translation for cross-language information retrieval

The paper introduces a query translation model that reflects the structure of the cross-language information retrieval task. The model is based on a structured bilingual dictionary in which the translations of each term are clustered into groups with distinct meanings. Query translation is modeled as a two-stage process, with the system first determining the intended meaning of a query term and then selecting translations appropriate to that meaning that might appear in the document collection. An implementation of structured translation based on automatic dictionary clustering is described and evaluated by using Chinese queries to retrieve English documents. Structured translation achieved an average precision that was statistically indistinguishable from Pirkola's technique for very short queries, but Pirkola's technique outperformed structured translation on long queries. The paper concludes with some observations on future work to improve retrieval effectiveness and on other potential uses of structured translation in interactive cross-language retrieval applications.

[1]  Dekang Lin,et al.  PRINCIPAR - An Efficient, Broad-coverage, Principle-based Parser , 1994, COLING.

[2]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[3]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[4]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[5]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[6]  Hsin-Hsi Chen,et al.  Resolving Translation Ambiguity and Target Polysemy in Cross-Language Information Retrieval , 1999, ACL.

[7]  Takenobu Tokunaga,et al.  The Use of WordNet in Information Retrieval , 1998, WordNet@ACL/COLING.

[8]  Douglas W. Oard,et al.  Support for Interactive Document Selection in Cross-Language Information Retrieval , 1999, Inf. Process. Manag..

[9]  Mark W. Davis,et al.  Getting Information from Documents You Cannot Read: An Interactive Cross-Language Text Retrieval and Summarization System , 1999 .

[10]  Jaana Kekäläinen,et al.  The impact of query structure and query expansion on retrieval performance , 1998, SIGIR '98.

[11]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[12]  Barbara M. Wildemuth,et al.  The transition from formalized need to compromised need in the context of clinical problem solving , 1999 .

[13]  Susan McRoy,et al.  Using Multiple Knowledge Sources for Word Sense Discrimination , 1992, Comput. Linguistics.

[14]  J. Scott McCarley Should we Translate the Documents or the Queries in Cross-language Information Retrieval? , 1999, ACL.

[15]  Marcia J. Bates,et al.  Subject access in online catalogs: A design model , 1986, J. Am. Soc. Inf. Sci..

[16]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.