Using Structured Queries for Disambiguation in Cross-Language Information Retrieval

Bilingual transthr dictionaries are an important resource for query translation in cross-language text retrieval. However, term translation is not an isomorphic process, so dictionary-based systems must address the problem of ambiguity in language translation. In this paper, we claim that boolea~l conjunction (the AND operator) provides siml)le and automatic disambiguation in the target language. We derive a new weighted boolean model based on a probabilistic formulation a~l(l apply it to the crosslanguage text retriewd prol)lem. The results suggest that the weighted boolean model is highly effective for general text retrieval, but more experimental evidence is need to conclude that it is particularly advantageous for cross-language application. Nonetheless, the preliminary results are quite promising.

[1]  Marti A. Hearst Improving Full-Text Precision on Short Queries using Simple Constraints , 1996 .

[2]  Douglas W. Oard Alignment of Spanish and English TREC Topic Descriptions , 1996, TREC.

[3]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[4]  Christos Faloutsos,et al.  On automatic filtering of multilingual texts , 1994, Proceedings of IEEE International Conference on Systems, Man and Cybernetics.

[5]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[6]  Charles L. A. Clarke,et al.  Shortest Substring Ranking (MultiText Experiments for TREC-4) , 1995, TREC.

[7]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Mark W. Davis,et al.  A TREC Evaluation of Query Translation Methods For Multi-Lingual Text Retrieval , 1995, TREC.

[10]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[11]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[12]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[13]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[14]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[15]  Joon Ho Lee,et al.  Properties of extended Boolean models in information retrieval , 1994, SIGIR '94.

[16]  Harold R. Lindman,et al.  Analysis of variance in complex experimental designs , 1974 .

[17]  Shinichi Doi,et al.  Translation Ambiguity Resolution Based On Text Corpora Of Source And Target Languages , 1992, COLING.

[18]  Mark W. Davis,et al.  New Experiments In Cross-Language Text Retrieval At NMSU's Computing Research Lab , 1996, TREC.

[19]  Gerard Salton,et al.  Automatic Processing of Foreign Language Documents , 1969, COLING.