Feasibility of Bootstrapping an Arabic WordNet Leveraging Parallel Corpora and an English WordNet

In this paper, we propose the automatic bootstrapping of a Modern Standard Arabic WordNet on the lexeme level using Arabic English parallel corpora and an English WordNet. We address the feasibility of such an endeavor and present a qualitative evaluation of the meaning correspondences cross linguistically between Arabic and English. We further present an automatic means of performing this task using an unsupervised Word Sense Disambiguation System. We test the feasibility of the bootstrapping by qualitatively evaluating the meaning definition projection of English words onto their Arabic translations. We manually evaluate 447 word instances of the Arabic words that correspond to correctly sense tagged English words using English WordNet 1.7. from the SENSEVAL 3 data. The words evaluated correspond to Nouns, verbs, adjectives in English. We find that for Arabic verbs, adjectives and nouns, on average 52.3% of all the words examined, the corresponding English WordNet set of definitions are sufficient as definitions for the Arabic translation word; 39.96% of the Arabic words correspond to specific subsets of the WordNet definitions; and finally, 7.8% of the Arabic words comprise supersets of their corresponding English WordNet translation definitions. These results are very encouraging as they are similar to those obtained by researchers building EuroWordNet.