Ontology-based Tamil–English cross-lingual information retrieval system

Cross-lingual information retrieval (CLIR) systems facilitate users to query for information in one language and retrieve relevant documents in another language. In general, CLIR systems translate query in source language to target language and retrieve documents in target language based on the keywords present in the translated query. However, the presence of ambiguity in source and translated queries reduces the performance of the system. Ontology can be used to address this problem. The current approaches to ontology-based CLIR systems use manually constructed multilingual ontology, which is expensive. However, many methods exist to automatically construct ontology for any domain in English but not in other languages like Tamil. We propose a methodology for Tamil–English CLIR system by translating the Tamil query to English and retrieve pages in English to address these issues. Our approach uses a word sense disambiguation module to resolve the ambiguity in Tamil query. An automatically constructed ontology in English is used to address the ambiguity of English query. We have developed a morphological analyser for Tamil language, Tamil–English bilingual dictionary and named entity database to translate a Tamil query to English. The translated query is reformulated using ontology and the reformulated queries are given to a search engine to retrieve English documents from the Internet. We have evaluated our methodology for agriculture domain and the evaluation results show that our approach outperforms other approaches in terms of precision.

[1]  Xinrong Cheng,et al.  Ontology-based semantic information retrieval , 2010, 2010 World Automation Congress.

[2]  Philipp Cimiano,et al.  Cross-language Information Retrieval with Explicit Semantic Analysis , 2008, CLEF.

[3]  Orkunt Sabuncu,et al.  An ontology-based retrieval system using semantic indexing , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[4]  Michael Strube,et al.  Transforming Wikipedia into a large scale multilingual concept network , 2013, Artif. Intell..

[5]  A. Govardhan,et al.  Indian Languages IR using Latent Semantic Indexing , 2011 .

[6]  S. Pourmahmoud,et al.  Semantic Cross-lingual Information Retrieval , 2008, 2008 23rd International Symposium on Computer and Information Sciences.

[7]  Sivaji Bandyopadhyay,et al.  Bengali, Hindi and Telugu to English Ad-hoc Bilingual Task at CLEF 2007 , 2007, CLEF.

[8]  Yuzhong Qu,et al.  An Integrated Approach for Automatic Construction of Bilingual Chinese-English WordNet , 2008, ASWC.

[9]  Pushpak Bhattacharyya,et al.  Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation , 2008, CLEF.

[10]  Mayank Singh,et al.  Ontology Based Information Retrieval in Semantic Web: A Survey , 2013 .

[11]  Philipp Cimiano,et al.  Exploiting Wikipedia for cross-lingual and multilingual information retrieval , 2012, Data Knowl. Eng..

[12]  Maria Pia di Buono,et al.  Natural Language Processing and Big Data - An Ontology-Based Approach for Cross-Lingual Information Retrieval , 2013, 2013 International Conference on Social Computing.

[13]  Sudeshna Sarkar,et al.  Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources , 2007, CLEF.

[14]  Rekha Warrier,et al.  Cross Language Information Retrieval using Multilingual Ontology as Translation and Query Expansion Base , 2015 .

[15]  PothulaSujatha,et al.  A Review on the Cross and Multilingual Information Retrieval , 2011 .

[16]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[17]  Enrico Motta,et al.  Semantically enhanced Information Retrieval: An ontology-based approach , 2011, J. Web Semant..

[18]  D. Thenmozhi,et al.  An Automatic and Clause-Based Approach to Learn Relations for Ontologies , 2016, Comput. J..

[19]  Umberto Straccia,et al.  A General Framework for Representing, Reasoning and Querying with Annotated Semantic Web Data , 2011, J. Web Semant..

[20]  Sylvie Ranwez,et al.  User centered and ontology based information retrieval system for life sciences , 2010, BMC Bioinformatics.

[21]  Vasudeva Varma,et al.  IIIT Hyderabad at CLEF 2007 - Adhoc Indian Language CLIR Task , 2007, CLEF.

[22]  Pushpak Bhattacharyya,et al.  Initiative for Indian Language IR Evaluation , 2007, EVIA@NTCIR.

[23]  K P Soman,et al.  Amrita Morph Analyzer and Generator for Tamil: A Rule based Approach , 2009 .

[24]  Tiejun Zhao,et al.  Chinese-English Cross-Lingual Information Retrieval based on Domain Ontology Knowledge , 2006, 2006 International Conference on Computational Intelligence and Security.

[25]  A. Kumaran,et al.  Cross-Lingual Information Retrieval System for Indian Languages , 2008, IJCNLP.

[26]  Sobha Lalitha Devi,et al.  Tamil English Cross Lingual Information Retrieval , 2010, FIRE.

[27]  Hermann Ney,et al.  POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[28]  Rabiah Abdul Kadir,et al.  Query Translation using Concepts Similarity based on Quran Ontology for Cross-Language Information Retrieval , 2013, J. Comput. Sci..

[29]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[30]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[31]  D. Thenmozhi,et al.  Tamil-English Cross Lingual Information Retrieval System for Agriculture Society , 2009 .