The Effects of Conjunction, Facet Structure, and Dictionary Combinations in Concept-Based Cross-Language Retrieval

The paper studies concept-based cross-language information retrieval (CLIR). The document collection was a subset of the TREC collection. The test requests were formed from TREC's health related topics. As translation dictionaries the study used a general dictionary and a domain-specific (=medical) dictionary. The effects of translation method, conjunction, and facet order on the effectiveness of concept-based cross-language queries were studied, and concept-based structuring of cross-language queries was compared to mechanical structuring based on the output of dictionaries. The performance of translated Finnish queries against English documents was compared to the performance of original English queries against the English documents, and the performance of different CLIR query types was compared with one another. No major difference was found between concept-based and mechanical structuring. The best translation method was a simultaneous look-up in the medical dictionary and the general dictionary, in which case cross-language queries performed as well as the original English queries. The results showed that especially at high exhaustivity (the number of mutually restrictive concepts in a request) levels cross-language queries perform well in relation to monolingual queries. This suggests that conjunction disambiguates cross-language queries. An extensive study was made of the relative importance of the concepts of requests. On the basis of the classification data of request concepts it was shown how the order of facets in a query affects cross-language as well as monolingual queries.

[1]  Robert M. Losee,et al.  Integrating Boolean queries in conjunctive normal form with probabilistic retrieval models , 1988, Inf. Process. Manag..

[2]  Mark W. Davis,et al.  A TREC Evaluation of Query Translation Methods For Multi-Lingual Text Retrieval , 1995, TREC.

[3]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[4]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[5]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[6]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[7]  E. Michael Keen,et al.  Presenting Results of Experimental Retrieval Comparisons , 1997, Inf. Process. Manag..

[8]  Gregory Grefenstette Evaluating the adequacy of a multilingual transfer dictionary for the cross language information retrieval , 1998 .

[9]  Mark W. Davis,et al.  New Experiments In Cross-Language Text Retrieval At NMSU's Computing Research Lab , 1996, TREC.

[10]  Donna Harman,et al.  The Text REtrieval Conferences (TRECs) , 1996, TIPSTER.

[11]  Ari Pirkola,et al.  The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval , 1998, SIGIR '98.

[12]  Martin Braschler,et al.  Cross-Language Information Retrieval in a Multilingual Legal Domain , 1997, ECDL.

[13]  W. Bruce Croft,et al.  INQUERY System Overview , 1993, TIPSTER.

[14]  Yamabana Kiyoshi,et al.  A Language Conversion Front-End for Cross-Language Information Retrieval , 1998, SIGIR 1998.

[15]  David A. Hull Using Structured Queries for Disambiguation in Cross-Language Information Retrieval , 1997 .

[16]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[17]  Douglas W. Oard,et al.  A survey of multilingual text retrieval , 1996 .

[18]  Jaana Kekäläinen,et al.  The impact of query structure and query expansion on retrieval performance , 1998, SIGIR '98.

[19]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[20]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[21]  Carol Peters,et al.  Using Linguistic Tools and Resources in Cross-Language Retrieval , 1997 .

[22]  Robert M. Losee Text retrieval and filtering: analytic models of performance , 1998 .

[23]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[24]  Tetsuya Ishikawa,et al.  Cross-Language Information Retrieval at ULIS , 1999, NTCIR.

[25]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984 .