Enhanced Query Expansion in English-Arabic CLIR

Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning. Modern Standard Arabic, which is used in formal writings, is the ancient Arabic language incorporated with loanwords derived from foreign languages. Different synonyms and loanwords tend to be used in different writings. Indeed, the Arabic composition style tends to vary throughout the Arab countries (Abdelali, 2004). Relevant documents could be overlooked when the query terms are synonyms or related to the ones used in the document collection. This could deteriorate the performance of a cross lingual information retrieval (CLIR) system. Query expansion (QE) using the document collection is the usual approach taken to enrich translated queries with context related terms. In this study, QE is explored for an English-Arabic CLIR system in which English queries are used to search Arabic documents. A thesaurus-based disambiguation approach is applied to further optimize the effectiveness of that technique. Indeed, experimental results show that QE enhanced by disambiguation gives an improved effectiveness.

[1]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[2]  Ahmed Abdelali,et al.  Arabic Information Retrieval Perspectives , 2004 .

[3]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[4]  Douglas W. Oard,et al.  CLIR Experiments at Maryland for TREC 2002: Evidence Combination for Arabic-English Retrieval , 2002, TREC.

[5]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[6]  Jihad Mohamad Jaam,et al.  THESAURUS-BASED QUERY DISAMBIGUATION METHOD FOR CROSS-LANGUAGE INFORMATION RETRIEVAL , 2002 .

[7]  Abdelghani Bellaachia,et al.  Proper nouns in English-Arabic Cross Language Information Retrieval , 2008, 2008 IEEE Symposium on Computers and Communications.

[8]  Ahmed Abdelali Localization in Modern Standard Arabic , 2004, J. Assoc. Inf. Sci. Technol..

[9]  John Murphy,et al.  Using WordNet as a Knowledge Base for Measuring Semantic Similarity between Words , 1994 .

[10]  David A. Hull Using Structured Queries for Disambiguation in Cross-Language Information Retrieval , 1997 .

[11]  George A. Miller,et al.  Nouns in WordNet: A Lexical Inheritance System , 1990 .

[12]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[13]  Rada Mihalcea,et al.  Semantic Indexing using WordNet Senses , 2000 .

[14]  Abdelghani Bellaachia,et al.  Proper nouns in English–Arabic cross language information retrieval , 2008 .

[15]  Noriko Kando,et al.  Two-Stage Refinement of Transitive Query Translation with English Disambiguation for Cross-Language Information Retrieval: A Trial at CLEF 2004 , 2004, CLEF.

[16]  Ophir Frieder,et al.  Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation , 2001, CIKM '01.

[17]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[18]  Noriko Kando,et al.  Two-Stage Refinement of Transitive Query Translation with English Disambiguation for Cross-Language Information Retrieval: An Experiment at CLEF 2004 , 2004, CLEF.

[19]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.