Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. One of the fundamental issues in bilingual retrieving of information in search engines seems to be the way and the extent users call for phrases and chunks. The main problem arises when the existing bilingual dictionaries are not able to meet the users actual needs for translating such phrases and chunks into an alternative language and the results often are not reliable. In this paper it has been tried to report the findings extracted from an experiment carried out in this respect to deal with this problem. In this project a heuristic method for extracting the correct equivalents of source language chunks using monolingual and bilingual linguistic corpora as well as text classification algorithms is to be introduced. For this purpose we use a statistical measure known as Association Score (AS) to compute the association value between every two corresponding chunks in the corpus. The results gained from the experiment carried out in this respect to examine the effectiveness of the heuristic method on extracting all possible chunks in Persian language and finding the most appropriate equivalents for them in English are very encouraging.
[1]
P. Cochat,et al.
Et al
,
2008,
Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.
[2]
Christopher D. Manning,et al.
Introduction to Information Retrieval
,
2010,
J. Assoc. Inf. Sci. Technol..
[3]
Tayebeh Mosavi Miangah.
Automatic Term Extraction for Cross-Language Information Retrieval Using a Bilingual Parallel Corpus
,
2008
.
[4]
Gregory Grefenstette,et al.
Querying across languages: a dictionary-based approach to multilingual information retrieval
,
1996,
SIGIR '96.
[5]
Tayebeh Mosavi Miangah.
Constructing a Large-Scale English-Persian Parallel Corpus
,
2009
.