Learning to Exploit Different Translation Resources for Cross Language Information Retrieval

One of the important factors that affects the performance of Cross Language Information Retrieval(CLIR)is the quality of translations being employed in CLIR. In order to improve the quality of translations, it is important to exploit available resources efficiently. Employing different translation resources with different characteristics has many challenges. In this paper, we propose a method for exploiting available translation resources simultaneously. This method employs Learning to Rank(LTR) for exploiting different translation resources. To apply LTR methods for query translation, we define different translation relation based features in addition to context based features. We use the contextual information contained in translation resources for extracting context based features.The proposed method uses LTR to construct a translation ranking model based on defined features. The constructed model is used for ranking translation candidates of query words. To evaluate the proposed method we do English-Persian CLIR, in which we employ the translation ranking model to find translations of English queries and employ the translations to retrieve Persian documents. Experimental results show that our approach significantly outperforms single resource based CLIR methods.

[1]  Paul Buitelaar,et al.  Semantic relations in concept-based cross-language medical information retrieval , 2003 .

[2]  Heshaam Faili,et al.  TEP: Tehran English-Persian Parallel Corpus , 2011, CICLing.

[3]  Tatsunori Mori,et al.  Cross-Lingual Information Retrieval based on LSI with Multiple Word Spaces , 2001, NTCIR.

[4]  Piek T. J. M. Vossen,et al.  Introduction to EuroWordNet , 1998, Comput. Humanit..

[5]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[6]  Hsi-Jian Lee,et al.  Anchor text mining for translation of Web queries: A transitive translation approach , 2004, TOIS.

[7]  Carol Peters,et al.  Applying EuroWordNet to Cross-Language Text Retrieval , 1998, Comput. Humanit..

[8]  Jian-Yun Nie,et al.  Using a Probabilistic Translation Model for Cross-Language Information Retrieval , 1998, VLC@COLING/ACL.

[9]  Massih-Reza Amini,et al.  Multiview Semi-supervised Learning for Ranking Multilingual Documents , 2011, ECML/PKDD.

[10]  Azadeh Shakery,et al.  Topic Based Creation of a Persian-English Comparable Corpus , 2011, AIRS.

[11]  Martti Juhola,et al.  Focused web crawling in the acquisition of comparable corpora , 2008, Information Retrieval.

[12]  James Mayfield,et al.  Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[13]  Paul Buitelaar,et al.  Ontologies in Cross-Language Information Retrieval , 2003, Wissensmanagement.

[14]  C. J. van Rijsbergen,et al.  Phrase Identification in Cross-Language Information Retrieval , 2000, RIAO.

[15]  Tayebeh Mosavi Miangah Constructing a Large-Scale English-Persian Parallel Corpus , 2009 .

[16]  Iryna Gurevych,et al.  Combining Query Translation Techniques to Improve Cross-Language Information Retrieval , 2011, ECIR.

[17]  Azadeh Shakery,et al.  A Language Modeling Approach for Extracting Translation Knowledge from Comparable Corpora , 2013, ECIR.

[18]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[19]  Jian-Yun Nie,et al.  Combining resources with confidence measures for cross language information retrieval , 2007, PIKM '07.

[20]  Jimmy J. Lin,et al.  Combining Statistical Translation Techniques for Cross-Language Information Retrieval , 2012, COLING.

[21]  Tao Tao,et al.  Mining comparable bilingual text corpora for cross-language information integration , 2005, KDD '05.

[22]  Shahram Khadivi,et al.  Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus , 2012, AMTA.

[23]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[24]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[25]  Mark Stevenson,et al.  EuroWordNet as a Resource for Cross-language Information Retrieval , 2004, LREC.

[26]  Hang Li Learning to Rank for Information Retrieval and Natural Language Processing , 2011, Synthesis Lectures on Human Language Technologies.

[27]  Susan T. Dumais,et al.  Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .

[28]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[29]  Marcello Federico,et al.  Statistical cross-language information retrieval using n-best query translations , 2002, SIGIR '02.

[30]  A. Mansouri,et al.  State-of-the-art English to Persian Statistical Machine Translation system , 2012, The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012).

[31]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[32]  Masoud Rahgozar,et al.  Hamshahri: A standard Persian text collection , 2009, Knowl. Based Syst..

[33]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[34]  Yi Liu,et al.  A maximum coherence model for dictionary-based cross-language information retrieval , 2005, SIGIR '05.

[35]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[36]  Fuji Ren,et al.  Advanced Information Retrieval , 2006, MFCSIT.

[37]  Azadeh Shakery,et al.  Using Learning to Rank Approach for Parallel Corpora Based Cross Language Information Retrieval , 2012, ECAI.

[38]  Hsin-Hsi Chen,et al.  A study of learning a merge model for multilingual information retrieval , 2008, SIGIR '08.

[39]  Fredric C. Gey,et al.  Combining multiple sources for short query translation in Chinese-English cross-language information retrieval , 2000, IRAL '00.

[40]  Philipp Cimiano,et al.  Cross-language Information Retrieval with Explicit Semantic Analysis , 2008, CLEF.