English-Hindi Cross Language Information Retrieval System: Query Perspective

The abundance of multilingual content on internet other than English gives an urge to develop information retrieval system that can cross language boundaries. Such cross lingual information retrieval systems will bridge this language gap and allow user to ask a query in regional language and retrieve relevant documents in a different language. The problem of finding relevant document in language different from source language is the most challenging application of any cross lingual information retrieval. This paper discusses the development process of complete English to Hindi cross language information retrieval system along with the contribution of individual components to the system. The main focus of this paper is to discuss how optimization is done to our disambiguation approach, which we named as ‘Two level Disambiguation method’. The experimental results obtained affirm that the addition of a component ‘Analyzer’ to our CLIR architecture increases the efficiency of our proposed disambiguation algorithm.

[1]  Juan Martínez-Romo,et al.  CO-graph: A new graph-based technique for cross-lingual word sense disambiguation , 2015, Natural Language Engineering.

[2]  Juan Martínez-Romo,et al.  Choosing the best dictionary for Cross-Lingual Word Sense Disambiguation , 2015, Knowl. Based Syst..

[3]  Mirna Adriani Using Statistical Term Similarity for Sense Disambiguation in Cross-Language Information Retrieval , 2004, Information Retrieval.

[4]  Gregory Grefenstette,et al.  Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[5]  Lam Tung Giang,et al.  Experiments with query translation and re-ranking methods in Vietnamese-English bilingual information retrieval , 2013, SoICT '13.

[6]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[7]  Pushpak Bhattacharyya,et al.  Natural Language Processing : A Perspective from Computation in Presence of Ambiguity , Resource Constraint and Multilinguality , 2012 .

[8]  Rajesh Kumar Chakrawarti,et al.  Approaches for Improving Hindi to English Machine Translation System , 2017 .

[9]  B. Raju Dictionary Based Translation Approaches in Cross Language Information Retrieval: State of the Art , 2015 .

[10]  Allan Hanbury,et al.  Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian , 2017, ArXiv.

[11]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[12]  Douglas W. Oard,et al.  A comparative study of query and document translation for cross-language information retrieval , 1998, AMTA.

[13]  Dong Zhou,et al.  Disambiguation and Unknown Term Translation in Cross Language Information Retrieval , 2007, CLEF.

[14]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[15]  Yi Liu,et al.  A maximum coherence model for dictionary-based cross-language information retrieval , 2005, SIGIR '05.

[16]  Mark W. Davis,et al.  QUILT: implementing a large-scale cross-language text retrieval system , 1997, SIGIR '97.

[17]  Mark Sanderson,et al.  Improving cross language retrieval with triangulated translation , 2001, SIGIR '01.