Literature Review of Cross Language Information Retrieval

Classical Information Retrieval (IR) is the sifting out of the documents most relevant to a user’s information requirement (expressed as a “query”), from a large electronic store of documents. A search engine performs IR by retrieving relevant web pages from the internet. Rather than regarding foreign-language documents simply as unwanted “noise”, Cross Language Information Retrieval allows the user to state their query in one language, and retrieve documents in another. Some CLIR systems use language resources such as bilingual dictionaries to translate the user’s original query, while other systems use machine translation to translate the foreign-language documents beforehand, enabling them to be retrieved by the original query. Problems arise due to ambiguity in language, the use of synonyms to express a single idea, and the lack of context available in translating a short query. This paper will discuss previous work in CLIR, current problems in CLIR, and make recommendations for future work. Keywords-Cross Language Information Retrieval, Lexical Semantics, Disambiguation, Translation.

[1]  K. Järvelin,et al.  Cross-Lingual Information Retrieval Problems : Methods and findings for three language pairs , 2002 .

[2]  Martin Braschler,et al.  Multilingual Information Retrieval Based on Document Alignment Techniques , 1998, ECDL.

[3]  Wessel Kraaij,et al.  Different approaches to Cross Language Information Retrieval , 2000, CLIN.

[4]  Turid Hedlund,et al.  Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings , 2001, Information Retrieval.

[5]  Justin Zobel,et al.  Finding approximate matches in large lexicons , 1995, Softw. Pract. Exp..

[6]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[7]  Kalervo Järvelin,et al.  Fuzzy translation of cross-lingual spelling variants , 2003, SIGIR.

[8]  Adam Kilgarriff,et al.  Dictionary word sense distinctions: An enquiry into their nature , 1992, Comput. Humanit..

[9]  Wessel Kraaij,et al.  Embedding Web-Based Statistical Translation Models in Cross-Language Information Retrieval , 2003, CL.

[10]  Peter Willett,et al.  Applications of n-grams in textual information systems , 1998, J. Documentation.

[11]  Jian-Yun Nie,et al.  Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web , 1999, SIGIR '99.

[12]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[13]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[14]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[15]  Turid Hedlund,et al.  Dictionary-Based Cross-Language Information Retrieval: Learning Experiences from CLEF 2000–2002 , 2004, Information Retrieval.

[16]  Mark W. Davis,et al.  Free Resources And Advanced Alignment For Cross-Language Text Retrieval , 1997, TREC.

[17]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[18]  Norbert Fuhr,et al.  Retrieval Effectiveness of Proper Name Search Methods , 1996, Inf. Process. Manag..

[19]  Adrian Akmajian,et al.  Linguistics: An Introduction to Language and Communication , 1979 .

[20]  Iadh Ounis,et al.  Building Bilingual Dictionaries from Parallel Web Documents , 2002, ECIR.