Query Expansion for Effective Retrieval Results of Hindi–English Cross-Lingual IR

ABSTRACT Information retrieval (IR) is the science of identifying documents or sub-documents from a collection of information or database. The collection of information does not necessarily be available in only one language as information does not depend on languages. Monolingual IR is the process of retrieving information in query language whereas cross-lingual information retrieval (CLIR) is the process of retrieving information in a language that differs from query language. In current scenario, there is a strong demand of CLIR system because it allows the user to expand the international scope of searching a relevant document. As compared to monolingual IR, one of the biggest problems of CLIR is poor retrieval performance that occurs due to query mismatching, multiple representations of query terms and untranslated query terms. Query expansion (QE) is the process or technique of adding related terms to the original query for query reformulation. Purpose of QE is to improve the performance and quality of retrieved information in CLIR system. In this paper, QE has been explored for a Hindi–English CLIR in which Hindi queries are used to search English documents. We used Okapi BM25 for documents ranking, and then by using term selection value, translated queries have been expanded. All experiments have been performed using FIRE 2012 dataset. Our result shows that the relevancy of Hindi–English CLIR can be improved by adding the lowest frequency term.

[1]  Hugh E. Williams,et al.  Query expansion using associated queries , 2003, CIKM '03.

[2]  S. M. Chaware,et al.  Ontology Approach for Cross-L anguage Information Retrieval , 2011 .

[3]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[4]  Marta R. Costa-jussà,et al.  CROSS-LANGUAGE DOCUMENT RETRIEVAL BY USING NONLINEAR SEMANTIC MAPPING , 2013, Appl. Artif. Intell..

[5]  Sanjay K. Dwivedi,et al.  Query expansion based on term selection for Hindi - English cross lingual IR , 2020, J. King Saud Univ. Comput. Inf. Sci..

[6]  Sudeshna Sarkar,et al.  Bengali and Hindi to English CLIR Evaluation , 2007, IJCNLP.

[7]  Aditi Sharan,et al.  THESAURUS AND QUERY EXPANSION , 2009 .

[8]  R. JOTHILAKSHMI,et al.  A SURVEY ON SEMANTIC QUERY EXPANSION , 2013 .

[9]  Kyo Kageura,et al.  Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval , 2002, COLING.

[10]  Gerhard Weikum,et al.  Exploiting correlated keywords to improve approximate information filtering , 2008, SIGIR '08.

[11]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[12]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.

[13]  Sujoy Das,et al.  Evaluation of the English-Hindi Cross Language Information Retrieval System Based on Dictionary Based Query Translation Method , 2007 .

[14]  Saurabh Varshney,et al.  Improving Retrieval performance of English-Hindi based Cross-Language Information Retrieval , 2013, 2013 IEEE International Conference in MOOC, Innovation and Technology in Education (MITE).

[15]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[16]  Ophir Frieder,et al.  Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation , 2001, CIKM '01.

[17]  L. Venkata Subramaniam,et al.  Handling Noisy Queries in Cross Language FAQ Retrieval , 2010, EMNLP.

[18]  Hardik Joshi,et al.  Transliterated Search using Syllabification Approach , 2013 .

[19]  Luis Gravano,et al.  Learning to find answers to questions on the Web , 2004, TOIT.

[20]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[21]  Pushpak Bhattacharyya,et al.  Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation , 2008, CLEF.

[22]  Ralph Grishman,et al.  Hindi-english cross-lingual question-answering system , 2003, TALIP.

[23]  Douglas W. Oard,et al.  A comparative study of query and document translation for cross-language information retrieval , 1998, AMTA.

[24]  Sujoy Das,et al.  Evaluation of the English-Hindi Cross Language Information Retrieval System Based on Dictionary Based Query Translation Method , 2007, 10th International Conference on Information Technology (ICIT 2007).

[25]  Peretz Shoval,et al.  Information Filtering: Overview of Issues, Research and Systems , 2001, User Modeling and User-Adapted Interaction.

[26]  Vasudeva Varma,et al.  Improving Recall for Hindi, Telugu, Oromo to English CLIR , 2007, CLEF.

[27]  Ganesh Chandra,et al.  A Literature Survey on Various Approaches of Word Sense Disambiguation , 2014, 2014 2nd International Symposium on Computational and Business Intelligence.

[28]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[29]  Sivaji Bandyopadhyay,et al.  Bengali, Hindi and Telugu to English Ad-hoc Bilingual Task at CLEF 2007 , 2007, CLEF.

[30]  Sujoy Das,et al.  Improving Performance of English-Hindi CLIR System using Linguistic Tools and Techniques , 2009, IHCI.

[31]  Benoît Gaillard,et al.  Query expansion for Cross Language Information Retrieval Improvement , 2010, 2010 Fourth International Conference on Research Challenges in Information Science (RCIS).

[32]  Bodo Billerbeck,et al.  Efficient query expansion , 2005 .

[33]  K. L. Kwok,et al.  Evaluation of an English-Chinese Cross-Lingual Retrieval Experiment , 2002 .

[34]  Sudeshna Sarkar,et al.  Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources , 2007, CLEF.

[35]  Douglas W. Oard,et al.  The surprise language exercises , 2003, TALIP.

[36]  Sanjay K. Dwivedi A Highest Sense Count Based Method for Disambiguation of Web Queries for Hindi Language Web Information Retrieval , 2012, Int. J. Inf. Retr. Res..

[37]  Syandra Sari,et al.  Learning to rank for determining relevant document in Indonesian-English cross language information retrieval using BM25 , 2014, 2014 International Conference on Advanced Computer Science and Information System.

[38]  Xiangji Huang,et al.  Mining query-driven contexts for geographic and temporal search , 2013, Int. J. Geogr. Inf. Sci..

[39]  Seema Shukla,et al.  Categorizing sentence structures for phrase level morphological analyzer for English to Hindi RBMT , 2015, 2015 International Conference on Cognitive Computing and Information Processing(CCIP).

[40]  Douglas W. Oard,et al.  Dictionary-based techniques for cross-language information retrieval , 2005, Inf. Process. Manag..

[41]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[42]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[43]  Burkhard Schafer,et al.  Natural language processing and query expansion in legal information retrieval: Challenges and a response , 2010 .

[44]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[45]  W. Bruce Croft SIGIR ninety-eight, Melbourne, Australia, August 24-28, 1998 : proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , 1998 .

[46]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[47]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[48]  Felipe Sánchez-Martínez,et al.  DOCUMENT TRANSLATION RETRIEVAL BASED ON STATISTICAL MACHINE TRANSLATION TECHNIQUES , 2011, Appl. Artif. Intell..

[49]  Gerard Salton,et al.  Experiments in Multi-Lingual Information Retrieval , 1972, Inf. Process. Lett..

[50]  Mark W. Davis,et al.  New Experiments In Cross-Language Text Retrieval At NMSU's Computing Research Lab , 1996, TREC.