论文信息 - Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

This paper describes disambiguation of Farsi homographs in unrestricted text using thesaurus and corpus. The proposed method is based on [1] with some differences. These differences consist of first using collocational information to avoid the collection of spurious contexts caused by polysemous words in thesaurus categories, and second contribution of all words in the test data context, even those not appeared in the collected contexts to the calculation of the conceptual classes' score. Using a Farsi corpus and a Farsi thesaurus, this method correctly disambiguated 91.46% of the instances of 15 Farsi homographs. This method was compared to three supervised corpus based methods including Naive Bayes, Exemplar-based, and Decision List. Unlike supervised methods, this method needs no training data, and has a good performance on disambiguation of uncommon words. In addition, this method can be used for removing some kinds of morphological ambiguities.

Mohammad Mehdi Homayounpour | Raheleh Makki

[1] Hwee Tou Ng,et al. Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[2] David Yarowsky,et al. DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[3] Hwee Tou Ng,et al. Exemplar-Based Word Sense Disambiguation” Some Recent Improvements , 1997, EMNLP.

[4] Tanja Gaustad,et al. Linguistic knowledge and word sense disambiguation , 2004 .

[5] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[6] Aravind K. Joshi,et al. 34th Annual Meeting of the Association for Computational Linguistics , 1996 .

[7] David Yarowsky,et al. Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[8] David Yarowsky,et al. Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[9] Nancy Ide,et al. Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[10] David Yarowsky,et al. A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[11] Lluís Màrquez i Villodre,et al. Naive Bayes and Exemplar-based Approaches to Word Sense Disambiguation Revisited , 2000, ECAI.