Query Expansion in Resource-Scarce Languages

Retrievals in response to queries to search engines in resource-scarce languages often produce no results, which annoys the user. In such cases, at least partially relevant documents must be retrieved. We propose a novel multilingual framework, MultiStructPRF, which expands the query with related terms by (i) using a resource-rich assisting language and (ii) giving varied importance to the expansion terms depending on their position of occurrence in the document. Our system uses the help of an assisting language to expand the query in order to improve system recall. We propose a systematic expansion model for weighting the expansion terms coming from different parts of the document. To combine the expansion terms from query language and assisting language, we propose a heuristics-based fusion model. Our experimental results show an improvement over other PRF techniques in both precision and recall for multiple resource-scarce languages like Marathi, Bengali, Odia, Finnish, and the like. We study the effect of different assisting languages on precision and recall for multiple query languages. Our experiments reveal an interesting fact: Precision is positively correlated with the typological closeness of query language and assisting language, whereas recall is positively correlated with the resource richness of the assisting language.

[1]  Fang Liu,et al.  Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization , 2013, ACL.

[2]  Mark Sanderson,et al.  BEST PRACTICES FOR TEST COLLECTION CREATION AND INFORMATION RETRIEVAL SYSTEM EVALUATION , 2010 .

[3]  Jun Zhao,et al.  Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering , 2012, COLING.

[4]  Milad Shokouhi,et al.  Query Expansion Using External Evidence , 2009, ECIR.

[5]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[6]  V ArjunAtreya,et al.  Structure Cognizant Pseudo Relevance Feedback , 2013, IJCNLP.

[7]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[8]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[9]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[10]  Sung-Hyon Myaeng,et al.  Query Phrase Expansion Using Wikipedia in Patent Class Search , 2011, AIRS.

[11]  Alan F. Smeaton,et al.  TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet and POS Tagging of Spanish , 1995, TREC.

[12]  Xiaohua Hu,et al.  Learning the Multilingual Translation Representations for Question Retrieval in Community Question Answering via Non-Negative Matrix Factorization , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Wei Gao,et al.  Using English information in non-English web search , 2008, iNEWS '08.

[14]  Pushpak Bhattacharyya,et al.  Multilingual PRF: english lends a helping hand , 2010, SIGIR.

[15]  Carol Peters,et al.  Cross-Language Evaluation Forum: Objectives, Results, Achievements , 2004, Information Retrieval.

[16]  Tao Tao,et al.  Regularized estimation of mixture models for robust pseudo-relevance feedback , 2006, SIGIR.

[17]  Craig MacDonald,et al.  Expertise drift and query expansion in expert search , 2007, CIKM '07.

[18]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[19]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[20]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[21]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[22]  Tetsuya Sakai,et al.  Flexible pseudo-relevance feedback via selective sampling , 2005, TALIP.

[23]  Pushpak Bhattacharyya,et al.  Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages , 2010, ACL.

[24]  W. Bruce Croft,et al.  Using Probabilistic Models of Document Retrieval without Relevance Information , 1979, J. Documentation.

[25]  Vasudeva Varma,et al.  Exploiting Structure and Content of Wikipedia for Query Expansion in the Context , 2009, RANLP.

[26]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[27]  Djoerd Hiemstra,et al.  A cross-lingual framework for monolingual biomedical information retrieval , 2010, CIKM.

[28]  Yang Xu,et al.  Query dependent pseudo-relevance feedback based on wikipedia , 2009, SIGIR.

[29]  John D. Lafferty,et al.  Document Language Models, Query Models, and Risk Minimization for Information Retrieval , 2001, SIGIR Forum.

[30]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[31]  W. Bruce Croft,et al.  A framework for selective query expansion , 2004, CIKM '04.

[32]  Ellen M. Voorhees,et al.  The TREC robust retrieval track , 2005, SIGF.

[33]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[34]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[35]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.