Automatic Alignment of Persian and English Lexical Resources: A Structural-Linguistic Approach

Cross-lingual mapping of linguistic resources such as corpora, ontologies, lexicons and thesauri is very important for developing cross-lingual (CL) applications such as machine translation, CL information retrieval and question answering. Developing mapping techniques for lexical ontologies of different languages is not only important for inter-lingual tasks but also can be implied to build lexical ontologies for a new language based on existing ones. In this paper we propose a two-phase approach for mapping a Persian lexical resource to Princeton's WordNet. In the first phase, Persian words are mapped to WordNet synsets using some heuristic improved linguistic approaches. In the second phase, the previous mappings are evaluated (accepted or rejected) according to the structural similarities of WordNet and Persian thesaurus. Although we applied it to Persian, our proposed approach, SBU methodology is language independent. As there is no lexical ontology for Persian, our approach helps in building one for this language too.