Query term disambiguation for Web cross-language information retrieval using a search engine

With the worldwide growth of the Internet, research on Cross-Language Information Retrieval (CLIR) is being paid much attention. Existing CLIR approaches based on query translation require parallel corpora or comparable corpora for the disambiguation of translated query terms. However, those natural language resources are not readily available. In this paper, we propose a disambiguation method for dictionary-based query translation that is independent of the availability of such scarce language resources, while achieving adequate retrieval effectiveness by utilizing Web documents as a corpus and using co-occurrence information between terms within that corpus. In the experiments, our method achieved 97% of manual translation case in terms of the average precision.

[1]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[2]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[3]  Yuji Matsumoto,et al.  Automatic Extraction of Translation Patterns in Pararell Corpora , 1996 .

[4]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[5]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[6]  Hsin-Hsi Chen,et al.  Description of the NTU Japanese-English Cross-Lingual Information Retrieval System , 1999, NTCIR.

[7]  RetrievalDouglas W. OardCollege Alternative Approaches for Cross-Language Text Retrieval , 1997 .

[8]  Noriko Kando,et al.  The NTCIR Workshop : the First Evaluation Workshop on Japanese Text Retrieval and Cross-Lingual Information Retrieval , 1999 .

[9]  Yuji Matsumoto,et al.  Japanese Morphological Analysis System ChaSen version 2.0 Manual , 1999 .

[10]  Tetsuya Ishikawa,et al.  Cross-Language Information Retrieval for Technical Documents , 1999, EMNLP.

[11]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[12]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[13]  Sung-Hyon Myaeng,et al.  Using Mutual Information to Resolve Query Translation Ambiguities and Query Term Weighting , 1999, ACL.

[14]  Jun Ohta,et al.  Experimental Studies on an Applet-based Document Viewer for Multilingual WWW Documents - Functional Extension of and Lessons Learned from Multilingual HTML , 1998, ECDL.

[15]  Shunsuke Uemura,et al.  Key Technologies for Multilingual Information Processing on WWW , 1999 .